Finish migrating Objenesis to GitHub

With Codehaus and Google Code closing, I’m happily required to migrate Objenesis and EasyMock.

Both projects have an hybrid hosting using different platforms. I’ve decided to start with Objenesis which is easier to migrate. I’ll describe the process in this post. Hopefully, it will be helpful to someone but, in fact, I’m also looking forward to your feedback to see if I can improve my final setup.

So, Objenesis source code was originally on Google Code. I’ve moved it to GitHub some years ago mainly because pull requests are awesome. The website is also a GitHub page. Part of the documentation is on Google Code wiki, the binaries are on Google Code as well as the issue tracker.

Google being nice folks, they used to provide a nice way to migrate everything to GitHub. But I couldn’t use that because

  • I don’t want the project to end up in henri-tremblay/objenesis. I want it where it is now, in the EasyMock organisation
  • I only want to migrate the wiki and the issues since the sources are already there

So here’s what I did instead.

Issue tracker

The issue tracker was migrated using the IssueExporterTool. It worked perfectly (but is a bit slow as advertised).

Wiki pages

At first, I tried to export the entire Objenesis project to Github to be able to retrieve the wiki pages and then move them to the real source code. The result was quite bad because the tables are not rendered correctly. So I ended up manually migrating each page to markdown. There was only 3 pages so it wasn’t too bad.

Binaries

This was more complicated. Maven binaries are already deployed through the Sonatype Nexus. But I needed to migrate the standalone bundles. I’ve looked into three options:

  • GitHub releases
  • Bintray
  • Both (GitHub releases exported to Bintray)

GitHub releases are created automatically when you tag in git. But you also seem to be able to add some release notes and binaries over that. I didn’t want to dig too much into it. I knew I wanted to be in Bintray in the end. And I wanted easy automation. Bintray was easy to use so I went for Bintray only. Be aware, the way I did the migration is low-tech (but works).

  1. Make a list of all the binaries to migrate
    • Get them with wget https://objenesis.googlecode.com/files/objenesis-xxx.zip
  2. Create an organisation, a distribution repository and an [objenesis package])(https://bintray.com/easymock/distributions/objenesis) in Bintray
  3. Add my GPG key to my Bintray account
  4. Upload everything using REST (curl -T objenesis-xxx.zip -H "X-GPG-PASSPHRASE: ${PASSPHRASE}" -uhenri-tremblay:${API_KEY} https://api.bintray.com/content/easymock/distributions/objenesis/xxx/objenesis-xxx-bin.zip?publish=1)

It works, the only drawback is that no release notes are provided. They’ve always been on the website

Project moved

Finally, I’ve set the “Project Moved” flag to target GitHub. This quite aggressively redirects you to GitHub if you try to access https://code.google.com/p/objenesis.

What I learned today

When creating this blog, the first thing I wanted to talk about are those things you learn everyday when fighting on a technical issue. Because if it happened to me, it will probably happen to you.

What I discover is that a lot of these things are too small to be a blog post or will in fact just repeat a really nice existing blog post where I found my answer in the first place. And a blog post just telling you to read another blog post is not that useful.

Instead, I've decided to tweet these blog posts each time I fell on one. I'll use the #WhatILearnedToday hashtag.  If you're interested, you can then follow me on Twitter.

See ya!

Microbenchmarking is fun

But time consuming. So always remember that I should have, for instance, read the assembly generated code to be able to explain why an implementation is faster than another. But I don't have that kind of time right now. So I was just toying around.

Anyway, a friend of mine has replaced a bunch of String.replaceAll with a way longer but way faster implementation based on a for loop. I was happy about it.

To make a long story short, it has bounced a bit around the internet and has created an official challenge on github. All this is well resumed (in French) by Olivier Croisier on his blog.

Since I have an interest in performance tuning, I've started to play with it. Being annoying, instead of providing my own implementation, I first had a look at the benchmark implementation itself. Done with JMH (always reassuring to see a benchmark that isn't home made).

So, first I asked for unit tests for make sure my implementations were accurate. Then I've extracted the benched methods in a specific class. At first, it was a static class inside the JMH benchmark. JMH is rewriting this class into something else. To be safe, I prefer to put the benched code where it will be for real.

After that, I notice an issue. The dataset used was randomly generated at the beginning of the benchmark. So each benched method was receiving a different dataset. Not good to make comparable results. We are now generating a dataset first and then run all benchmarks on it.

Finally, I wanted the dataset to be representative of a real file. Some I made some statistics on a real file to get the usual length of lines, occurence of line feeds, etc. This allowed me to change the generator to create a better result.

Now I was set. I must confess, my best result is a plain copy of Cédric Champeau's solution with a tiny tuning. I made sub-methods. On my machine (OS X Yosemite, 2.6 GHz Intel Core i7, Java HotSpot 64 bits 1.8.0_40), it's 10% faster. On Travis, it's not that conclusive.

Anyway, now the funny part is that I tried lots of things that should have been more efficient but that are not.

  • Putting the pattern of the regex in a constant should be faster than only recreating it all the time. No. Pretty much the same result
  • Adding the missing else in two subsequent ifs should prevent a comparison and be faster. No. It's almost slower
  • Using the ternary operator instead of a if can be a bit faster. But not much. But it is different.
  • Removing a ++ that was redundant doesn't change anything
  • Use a new array instead of cloning and then writing over a single array. This removes a huge copy. But it's slower. Probably because of page faults.
  • Using arraycopy instead of assigning every single character. Much slower
  • Using Unsafe to retrieve, allocate and assign. Also much slower. But I'm pretty sure I can improve this one
As I said, I've not tried to explain the results. I'm just assuming they are valid. Please don't do this at home.

However, if the results are indeed valid, it seems that the JVM optimisations give counter-intuitive results. Sometime, doing more things if in fact faster.

And of course, if you want to try the challenge, it's open to all :-)



The history of partial mocking

Someone asked me this week to give some legitimate reasons to use partial mocking. It's indeed a good question. Especially because the EasyMock documentation is explicit on telling that using it is probably a code smell.

The funny thing about partial mocking is that I've invented it to workaround issues we had at that time (2004... a long long time ago in the computer engineering world). It's only later that we've discovered some legitimate usages.

You see, when I first started to work on class mocking, it wasn't possible to bypass the constructor yet (or at least, I wasn't yet digging in the JVM far enough to be able to do so). So, I was using a lot of deductions to find which constructor will be the best to use. The algorithm was something like this:
  1. Try to use the default constructor
  2. If it's not there, use the constructor with the less parameters and create a mock for each of these parameters
  3. Pray it won't fail when creating the new instance
Frequently, this method would fail because some nasty individual had put really agressive code in a constructor. For instance, I've seen constructors opening sockets...

So, to workaround that, I made it possible to select the constructor to use and also to pass real arguments to it. The arguments were not used afterwards. They were just there to prevent the constructor from crashing.

But then, something worse happened. Some constructors were calling other methods... And since no expectations were set (of course! The mock doesn't exist yet!), mock creation was failing.

Partial mocking was born to solve that.

I made it possible to prevent some methods to be mocked so when the constructor would call them, they would behave as usual and I'll get my mock. TADA!

Yes, it was ugly, hacky and I wasn't really proud of it but that was the best I could do at that time (remember: 2004).

Not so much later, I found a way to bypass the constructor entirely the same way HotSpot was doing it during serialization. The main drawback was that EasyMock was now working only on HotSpot since the code was HotSpot specific.

The good news are that this code was already used by a bunch of other frameworks like XStream for instance. So other JVMs were starting to be compliant. I remember asking the JRockit team about it and two minor versions later, it was there.

Still, that's was caused the creation of Objenesis. The magic library that creates objects without calling the constructor on any JVM. In fact, Objenesis is not that useful anymore because we have Unsafe.allocateInstance. Unsafe also is OpenJDK specific but so many frameworks are using it that pretty much everyone has implemented it. However, I'm still using Objenesis for two reasons:
  1. You never know when a JVM won't be compatible with Unsafe
  2. My benchmarks have shown the using JVM specific code is way faster than using Unsafe
So I'm staying on Objenesis for now (and there is an instantiator using Unsafe so there's no drawback using it).

Anyway, back to partial mocking.

So, we are now able to bypass the constructor and partial mocking was created because we were not able to. Why is it still there?

There are two main usages. The first and most legitimate one is to test the Template Method Pattern. Let's say you have an abstract base class with abstract methods and concrete methods.
public abstract class BaseClass {
public boolean beTruthy() {
// stuff
boolean b = doSayTheTruth();
// other stuff
return b;
}

protected abstract boolean doSayTheTruth();
}
You want to test the concrete ones.
  • You could create a fake implementation, but that's annoying. 
  • You could just use a real implementation but then it will make you test more than needed. 
  • Or you could create a partial mock. 
In fact, when creating a partial mock, EasyMock now automatically considers that abstract methods will be mocked and concrete methods won't.

So you could do
BaseClass myClass = createMockBuilder(BaseClass.class).createMock();
expect(myClass.doSayTheTruth()).andReturn(true);
replay(myClass);
assertTrue(myClass.beTruthy());
verify(myClass);
How sweet.

The other reason is a little bit less legitimate. Let's say you have a class with a bunch of methods calling each others.

You would like to test the class but testing everything at once is nearly impossible.

A good solution is to test the methods one after the other by using partial mocking.
  1. You will test the first method by mocking everything it calls. 
  2. You will then test another method down the stack by mocking everything it calls. 
  3. And so on and so on.
Of course, that's bad code! But if there are no tests, you can't refactor! Step one, the tests. Step two, refactor. And yes, during the refactoring, the need for partial mocking should disappear. But that doesn't mean partial mocking isn't helpful.

That's all for today. Happy partial mocking. I hope you find this story interesting and/or useful.

Stubbing with fluent-http

One thing I do quite often if to make sure I can lunch my application offline. For instance, my application needs to access an external system that is only available from my company network.

So if I'm offline, it just doesn't work. And I don't like my ideas to go on hold just because I'm not in front of my desk at work or because the remote service is down.

Normally I would launch an embedded something (usually jetty) and have it server my stuff. The trick is to save real results from the actual server in some files and then serve these files. It's a bit complex to configure but in the end it works.

I use Spring profiles to turn the fake mode on and off.

This time, I tried something different. I used fluent-http. It is a really lightweight web server that gets configured a bit like NodeJS would.

I'm quite happy about it so I thought I should share.

My first usage was to turn a kinda integration test into a unit test. So I have a test that used to call the real server and that is now calling my fake server. I can in fact launch it in both modes. That's useful to make sure my code it still compliant with the actual server implementation.

It looks like this:

@Beforepublic void setup() throws Exception {
if(fake) {
initFakeServer();
}
}

private void initFakeServer() {
server = new WebServer().configure(routes -> routes
.get("/api/stuff", requestContent("stuff.json")))
).start(9456);
}

@Afterpublic void tearDown() {
 if(fake) {
        server.stop();
    }
}

private Payload requestContent(String file) {
try {
String content = new String(Files.readAllBytes(Paths.get("src/test/data/stuff", file)), "UTF-8");
return new Payload("application/json;charset=UTF-8", content);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
Nice and sweet. Fluent-http can also serve static files directly based on the extension if you prefer. But you will then need to match your directory layout with was is normally served by the real server.

Two gotchas though:

  • The content type can normally be deduced from the filename extension. However, the json extension wasn't supported. I made a pull request that was merged today so it should be fixed soon
  • The dependencies in the maven pom file are a bit more fast than what I was expecting. It might clash with some of your. I'm currently excluding slf4j-simple. This one should probably be in scope provided in the code

Syntax highlighting in PowerPoint

Today I was wanting to copy some code in a PowerPoint presentation (yes, I still think PowerPoint is the best presentation tool there is).

On Windows, it's pretty easy. I used to use Notepad++ to copy the text including the highlighting and job done.

On a Mac, it seems a bit more complicated. There seem to be some IntelliJ plugins but I had no luck with them.

I ended up using a neat trick. It's not great but at least it works. One of the main drawback is that it always copy the entire file. And Powerpoint seems to have a bug. When you copy something that is larger than the slide, you don't get to have the "Keep source formatting" option.

Anyway, it's cumbersome but it works.

First you need pygments. Which is installed using pip
sudo easy_install pip
sudo pip install pygments

Then you can do that:

pygmentize -f rtf Allo.java | pbcopy

It will highlight the file and put it in the clipboard. Tadam!

But I'm still open to any answer that is simpler than that...

UPDATE: It seems that the IntelliJ plugin 'Copy' on steroid is now working on my machine. It just magically copy as RTF text. Really useful 

Extract release notes from Jira

If you are like me, you love automation. But sometimes, software and automation just don't like each other.

Today, I wanted to do a really simple thing. I wanted to retrieve the release notes for an EasyMock version on Jira. Seems easy enough. Jira has a great REST API that should allow to do that in two seconds.

Wrong!

Let's say you want the release notes for version 3.2. There is no REST API for the release notes. There is one for the versions. However, to get the details of a version, you need the version id. Which isn't 3.2 but an internal id. And there is no search available on the versions.

I've decided to keep things "simple" by not using any high level language. Only bash. I'm pretty sure Gradle could have helped me but I didn't want to put my arm in there yet.

So, first step, get the id by grepping it from the list of all versions.

version=3.2
# Seems complicated but I'm just adding a backslash before the . in 3.2
escaped_version=$(echo $version | sed "s/\([0-9]*\)\.\([0-9]*\)/\1\\\.\2/")

# Retrieve all versions and the id. Hoping a bit that the fields will stay in the same order
jira_version_id=$(curl --silent "http://jira.codehaus.org/rest/api/2/project/EASYMOCK/versions" | grep -o "\"id\":\"[0-9]*\",\"description\":\"EasyMock $escaped_version\"" | cut -d'"' -f4)
Good. Now I have the version id. But sadly, the API doesn't give the release notes for this version. But the Jira GUI does. When doing it by hand, you click on the version, the on the release notes, the you select the format wanted (text or html) and finally, you reach a page with a text area showing exactly what I want.

But, of course, there's no way to get only the content of the text area. So I now need to extract it from the page

# Get the page
release_notes_page="http://jira.codehaus.org/secure/ReleaseNote.jspa?version=${jira_version_id}&styleName=Text&projectId=12103"
release_notes=$(curl --silent "$release_notes_page")

# Extract the interesting page
echo "$release_notes" | sed -n "/<textarea rows=\"40\" cols=\"120\">/,/<\/textarea>/p" | grep -v "textarea

Et voilà!

BTW, there is a Jira opened in Atlassian bug tracker asking for this exact feature. Please vote.

My keyboards

I’ve lived in France for 14 years. I loved the food. I loved the wine (and learned a lot about it). Even the people are nicer than you would expect. I also picked, partly, the accent. Not because I wanted it. It just happened.

However, there is one thing I’ve always refused to adapt to, it’s the AZERTY keyboard. It makes me insane. Mostly because

  • The dot needs a shift key
  • The numbers need a shift key
  • The M is really far, you can’t do it with your left hand while pressing the Windows key
  • It’s impossible to put an accent on a capital letter (“Yes but we never put accents on a capital letter” “Yes but in Québec we do. So in Québec, "élodie".toUpperCase().toLowerCase() brings you back to "élodie". As expected
  • And last but not least, parenthesis, brackets and square brackets are not opened and closed with the same hand

Since I knew all this, I came to France with two Canadian French keyboards. The best keyboard to code is the English-US. But if you also want to write in French, the Canadian French layout is the best compromise.

One keyboard was used at home, one at work. I swapped them after 4 years because the work one was too used as you will see in the picture.

Keyboard

Both are now in the same state. After 7 years I bought two new ones. Sadly, the quality was not the same. The keys were too harsh to press. So I’ve replaced my work keyboard with a Logitech K800. At home, I’m typing on my laptop.

I took the picture by nostalgia before coming back to Montreal and leaving the two well-used keyboards behind.

Having a Mac

I normally use Windows and once in a while Ubuntu. I'm one of the rare persons who love Windows 8. It is by far the best version. Mainly because I'm a power user. I never use the tiled start screen. It's like on a Mac. You should never use the application bar. You should use Spotlight. On Windows, you should use Windows+Q and Windows+W to find settings and applications. As soon as you've learned there two shortcuts, life is good.

Back to the Mac. These days, Windows constructors are a selling bad hardware. SSDs are rare, autonomy is bad, screen is ok (not wow), etc etc... Macs are strangely the best deal you can get. For instance, a Mac Book Air is the cheapest computer on which to run Windows efficiently. Doh.

So I ended up with a Mac and decided that before formatting it, I would try OS X (Maverick) just in case. Some stuff is good. It's really fluid and it wakes up instantaneously for instance.

However, the guy who did the windowing should be exchanged with an hostage of IS. That should slow them down. I will spare him if he agrees to implement the following suggestions:

  • Cmd+Tab should bring to front the requested window. Always. It doesn't matter is the window is minimized, in background or whatever. Just bring it in front
  • Cmd+Tab should allow to select between the multiple windows of an application. Or at least any other shortcut. Being forced to go to the Window menu is not a good option
  • Home should always go to the beginning of the line. End to the end. There's no point in randomly changing this behavior depending on the application
  • Also, Cmd+Left or Fn+Left should always go home. Cmd+Right or Fn+Left should go at the end. Pick one, stick with it and it should work everywhere
  • I should be able to lock my workstation immediately but also to have it locked automatically in 15 minutes. Those are two different needs
  • Why having virtual desktop for only maximized windows? Why can't I create as many desktops as I want and put whatever I want on them?
  • Being able to dock windows on the left or right natively would be nice. I currently use BetterTouchTool for that
  • We should really be able to hibernate for real. Even is it means that the computer will wake up more slowly in that case. Nothing is worst than leaving the computer to rest for the night and to discover in the morning that the battery ran out magically
  • The Windows ca-fr keyboard was perfect. Why moving the brackets and square brackets to somewhere I can't reach them? And don't get me started on the azerty on mac. It's a great feat to turn the worst keyboard on Earth into something even worse... 
That would be a good start. I've also replaced the shell but I will talk about that another time.

New EasyMock website is out!

After years having the most vintage open source website out there, the EasyMock team is proud to announce that the new website is up and running.

Please tell us what you think. Directly on the pages using Disqus, in comment of this blog post, by twitter (@henri_tremblay) or in the mailing list.

Next steps:
  • Be able to deliver more quickly
  • Deliver EasyMock 3.3 ASAP!
Meanwhile, enjoy the website.