The other day I finally came up with the problem and solution to a sporadic bug that had been appearing in our main application at work for a couple of weeks and driving me absolutely nuts. This was one of those bugs that if anyone asked you what was causing it you could swear on your life without flinching that it was un-reproducible, so what was it and how did I fix it?

Well I had a real vested interest in solving this bug as it was one that I had actually introduced in a fairly substantial overhaul to some of our core booking code, but apart from that I don’t like failure.

The bug was occurring when users performed a specific action (adding some extra items to a product), however any attempt to reproduce it failed; even when trying the exact same data structures almost instantly after receiving exception reports. On the surface it looked like a CFC threading issue, but I looked over every inch of the code I’d changed and couldn’t find any threading issues.

So back to square one I returned, with more debugging added to exception reports confirming that something a little odd was going on, but again completely un-reproducible. I knew that this was one of only two areas of the application where we use sessions (for various reasons we don’t employ sessions very much in our application,) so I figured that somehow the session data was being lost, timeouts etc.

I had almost removed the requirement of sessions for this feature and it would have only taken me a few more hours to totally remove the use of sessions, but as I was contemplating doing this it hit me - the cause of the exceptions was sessions, but not session timeouts.

Within 1 minute I had replicated a bug that I had failed to replicate in weeks of trying and knew the 10 minute fix I had to apply.

So what was the problem? It was that users were doing something I had not even thought about, but when looking at the error in front of me I knew they were doing and should have expected; users were comparing two prices at once in two browser windows. Whenever they added items to one product that overwrote the session data for the other and vice versa so when they added extra items to the product in the second window the error would occur.

I learnt a couple of things from this:

  • No matter how much time we spend building and testing an application users will find un-anticipated ways to interact with it (and possibly break it).
  • There is one more reason why we shouldn’t be using sessions for our application.
  • It is still not clear whether swearing at my computer for days on end may or may not be beneficial in fixing difficult bugs.