[epic_dropcap style=”dark_ball”]E[/epic_dropcap]very game has bugs. That’s a fact of life. Sometimes they’re big and fat and hard to miss, but sometimes they’re nasty little buggers that hide inside lines and lines of code and nibble away at your game (and sanity). Joost van Dongen, Tech Lead at Ronimo Games, had his fair share of bugs, but this was by far ‘the lamest’ he ever encountered while working on Awesomenauts.
“In the past half-year we encountered a really rare bug in the PS3 build of Awesomenauts about once a month. The game would freeze for anywhere between 10 to 100 seconds, and then continue normally as if nothing had happened. We always have a PC connected to collect logs from the game. However, the log printed nothing interesting and showed simply the frame rate, which is printed once per second:
13:48:13 60 fps
13:48:14 60 fps
13:49:21 1 fps
13:49:22 60 fps
From this we concluded that the issue must be in some part of the code that does not log anything. That leaves 99% and with a codebase of at least 150,000 lines, it’s almost impossible to find.
Since this issue was so rare and we hadn’t encountered it in months, we had closed the book on it. We thought it was an anomaly that we couldn’t find or fix and that had somehow disappeared.
Until we were testing our latest build last Thursday. We were playing on seven PlayStation 3 devkits, when suddenly 4 of them froze for about 90 seconds. Argh! We thought this really difficult bug had magically disappeared, but now it was back, with a vengeance!
Again, no relevant logging. Yet something really interesting nevertheless: the PlayStations were in different online matches, so they weren’t even communicating with each other! So how could they all freeze at the exact same time? We concluded that the only thing they had in common, was that they all talk to the same Sony matchmaking servers, so we started investigating all our code related to that. Still, we couldn’t find anything.
So we added a lot more logging, and did some really advanced stuff to get more info on the stack during the freeze (which is difficult to get from an executable that has been stripped of all debugging info), and started playing again. I let the PlayStations perform automatic testing all night, but the bug didn’t occur. Then we played the game for five more hours with the entire team and BAM!, it finally happened again on two consoles!
This time, we had more info and it turned out that the game froze in different spots on both consoles, and both did not contain any calls to Sony’s matchmaking servers. In fact, it was in between two logging calls, in a spot where nothing relevant was happening. So we concluded there were only two possible causes: either other threads were hogging the entire CPU (due to how the scheduling system on the Playstation 3 works, high priority threads can do this permanently), or the logging itself was broken.
So we started experimenting around that, and then we finally found the cause of this ‘bug’: when the PC that is tracking logs goes into sleep mode, the connected consoles freeze a little while later. Once the PC is active again, the consoles continue as well a little later. The PC that was tracking the logs automatically went into sleep mode after not touching it for
30 minutes. This only happened during extensive play-testing, because people normally actually use that PC. So it wasn’t even a bug in our code! ARGH!
This may all seem really obvious in hindsight, but in general when we have a bug/freeze/crash it is in our own code, not in one of the tools we use. With such a big codebase, it is easy to not even think about something else. Also, in the chaos of 14 people playing the game on 7 consoles, it is easy to overlook one specific PC going into Sleep mode right before the consoles freeze… To be honest I still don’t know why not all consoles connected to that PC froze. But I intend to leave it at that…”
Awesomenauts will be out this spring on PSN, XBLA and Steam.