Sunday, March 23, 2014

Detour: Fixing a Segfault

When playing with the demo version of EMI (which you can get for free here from the ResidualVM project), I found that when you press F1, which usually brings you to a menu, the game pauses, but no menu appears. While this might be the intended behavior, pressing Esc now causes ResidualVM to crash. Let's explore this crash and figure out how to fix it!

First, we'll need to restart ResidualVM with a debugger. Start up ResidualVM with the command:
  • gdb ./residualvm
When the prompt comes up, type run to begin running ResidualVM. Start the game as usual, and then trigger the crash. When the crash occurs this time, in our window with gdb, we can see that the debugger has caught the error and stopped execution:
Investigating the ResidualVM Crash
We can see that there is an error from ResidualVM itself, warning us that there was a null value in the Lua engine from the 'gettable' tag method, and also the actual crash in ResidualVM which follows that.

It's helpful to see how we got here, and GDB let's us do that by checking the backtrace. Type bt to see the backtace, which is a list of the function calls that brought us to this location. Typing up or down moves the current position up or down in the stack, letting us look at different variables at each call. To print out a variable's value, use the print or p command followed by the variable name you're interested. It's also sometimes useful to see the code that preceded the error. Using the list or l command will print the code around the current position in the stack.

Finding the Error with GDB
After some exploration, we see that we dereferenced the g_registry variable when it was null, causing the segfault.

Now that we know what's causing the problem, let's look at the engine code for the registry functions and see if we can identify why this is breaking in the demo.

In engines/grim/grim.cpp (line 87) we see that the registry is initialized when the engine is started and the game type is ''GType_GRIM''. In engines/grim/detection.cpp (line 413) we see that the game type for the EMI Demo is ''GType_MONKEY4''. So, the registry is never created for the demo, and because of that, the g_registry variable is never set.

The registry for Grim Fandango holds settings and values that would normally be found in the Windows registry. As of now, there are no EMI specific registry options set up and the registry is specific to Grim Fandango. Although we'll probably need to implement something like this later on, for now, let's fix the segfault by preventing the code from dereferencing g_registry when it's null.

Searching through the code, we find that the only accesses to g_registry that don't check for a null value are in engines/grim/lua_v1.cpp in the functions:
  • Lua_V1::ReadRegistryValue
  • Lua_V1::WriteRegistryValue
  • Lua_V1::postRestoreHandle
Looking back at our backtrace, we see that indeed, the path that the code took went through the ReadRegistryValue function before segfaulting. Since
the code in postRestoreHandle that accesses g_registry is only used in games with the ''GType_GRIM'' tag, we can safely ignore this instance as those games will always have the g_registry variable set. With checks added to see if g_registry is defined before performing registry actions in the functions ReadRegistryValue and WriteRegistryValue, the bug is fixed and ResidualVM no longer crashes. I also added a warning to let anyone else know that there was an attempt to access a registry that didn't exist. To fix this bug properly, we should override these functions so that they point to the EMI_Registry instead. For now, we'll stick with just fixing the segfault.

So now, we have fixed the segfault, but the game is now stuck, preventing the demo from continuing further. In the next post, we'll discuss how to fix this issue.

No comments:

Post a Comment