Microbenchmarking is fun

But time consuming. So always remember that I should have, for instance, read the assembly generated code to be able to explain why an implementation is faster than another. But I don't have that kind of time right now. So I was just toying around.

Anyway, a friend of mine has replaced a bunch of String.replaceAll with a way longer but way faster implementation based on a for loop. I was happy about it.

To make a long story short, it has bounced a bit around the internet and has created an official challenge on github. All this is well resumed (in French) by Olivier Croisier on his blog.

Since I have an interest in performance tuning, I've started to play with it. Being annoying, instead of providing my own implementation, I first had a look at the benchmark implementation itself. Done with JMH (always reassuring to see a benchmark that isn't home made).

So, first I asked for unit tests for make sure my implementations were accurate. Then I've extracted the benched methods in a specific class. At first, it was a static class inside the JMH benchmark. JMH is rewriting this class into something else. To be safe, I prefer to put the benched code where it will be for real.

After that, I notice an issue. The dataset used was randomly generated at the beginning of the benchmark. So each benched method was receiving a different dataset. Not good to make comparable results. We are now generating a dataset first and then run all benchmarks on it.

Finally, I wanted the dataset to be representative of a real file. Some I made some statistics on a real file to get the usual length of lines, occurence of line feeds, etc. This allowed me to change the generator to create a better result.

Now I was set. I must confess, my best result is a plain copy of Cédric Champeau's solution with a tiny tuning. I made sub-methods. On my machine (OS X Yosemite, 2.6 GHz Intel Core i7, Java HotSpot 64 bits 1.8.0_40), it's 10% faster. On Travis, it's not that conclusive.

Anyway, now the funny part is that I tried lots of things that should have been more efficient but that are not.

  • Putting the pattern of the regex in a constant should be faster than only recreating it all the time. No. Pretty much the same result
  • Adding the missing else in two subsequent ifs should prevent a comparison and be faster. No. It's almost slower
  • Using the ternary operator instead of a if can be a bit faster. But not much. But it is different.
  • Removing a ++ that was redundant doesn't change anything
  • Use a new array instead of cloning and then writing over a single array. This removes a huge copy. But it's slower. Probably because of page faults.
  • Using arraycopy instead of assigning every single character. Much slower
  • Using Unsafe to retrieve, allocate and assign. Also much slower. But I'm pretty sure I can improve this one
As I said, I've not tried to explain the results. I'm just assuming they are valid. Please don't do this at home.

However, if the results are indeed valid, it seems that the JVM optimisations give counter-intuitive results. Sometime, doing more things if in fact faster.

And of course, if you want to try the challenge, it's open to all :-)