Tuesday, March 25, 2014

Detour: Fixing the Segfault - Part 2

Continued from the previous entry

With the segfault resolved, the bug no longer crashes ResidualVM, but instead, the game gets stuck, preventing the player from continuing. In the game log, we see some messages that might help us to determine the problem:

New Error in the Log, Exposed by Fixing the Segfault
In this error, we see first that the Lua interpreter is printing lua: (null). This indicates that a value was unexpectedly null. The Active Stack tells us where this error occurred, much like the backtrace in the previous entry, but for the Lua script. Finally, we see the warning that we added in the previous entry, telling us that there's a registry key read, SpewOnError, which doesn't cause the segfault anymore because of our fix. So that's good!

In the stack trace, we see that this failed on a call to a tag method named (helpfully) function, so let's see if we can find that code. In the EMI Demo, the scripts of interest are located in the MagDemo.lab file. Following the same directions as before, unlab the file and delua the Lua scripts.

Inside, _options.lua, there's a comment with the original line number, 1503. Search for that and we find that the function called was main_menu.return_to_game, which kind of makes sense as to why things are getting fouled up. In the original demo on Windows, pressing F1 does not bring up the main menu, but rather, does nothing, while pressing ESC skips the cutscene.

It appears that the game is in a wrong state, but it would be helpful to have more information about the problem and more details as to what was run. Let's enable debugging information in ResidualVM to see if there's anything else that can help us track this down.

In ResidualVM, there are debug flags that can be enabled from the command line, like this:
  • ./residualvm --debugflags=<flag list separated by commas>
In addition, there's an in game debug mode you can enter by pressing CTRL-D. This will bring up a console from which you can turn on debug flags. For both, individual classes of flags may be enabled instead of all flags if you're working on a specific area of the engine.

Let's tackle the part of the bug where pressing ESC doesn't skip the cutscene first. With all of the debugging messages on, we get these messages in the log when we press ESC during the opening cutscene:

Debug Output of Pressing ESC in the Cutscene
Following the path of execution, we see a call to GetControlState(). This function returns the state of the key being passed in. From common/keyboard.h, we see that the keys its checking are:
  • KEYCODE_LCTRL (306)
  • KEYCODE_LALT (308)
  • KEYCODE_BACKSPACE (8)
  • KEYCODE_LCTRL (306)
  • KEYCODE_LALT (308)
From this sequence, it appears that the script being run is the SampleButtonHandler (in _control.lua), which includes the first three calls, then the CommonButtonHandler (in _control.lua) which does the second two.

We then see that the script reported that the Override key was hit. This code is in the CommonButtonHandler which then calls the call_override function, which can be found in the _system.lua file. This function is supposed to stop the current script if the system override is active. Let's check the value of this variable.

Using the console, type:
  • lua_do if(system.override.is_active) then PrintDebug("Active") else PrintDebug("Inactive") end
We find that the override is inactive after starting the game up again and checking the log. So this function call does nothing, control is handed back to the button handler and the movie continues. This reflects the behavior that we see when playing the game, the ESC key is ignored.

Also, in the previous screenshot, we see a repeated series of functions:
  • function: IsMoviePlaying()
  • function: break_here()
These function calls either come from a function in the _system.lua script called wait_for_movie which does these checks repeatedly until the movie is finished, or from the StartMovie function which contains similar logic.

In the actual game, in _cut_scenes.lua, there's a call to EscapeMovie when the override key is pressed during the playback. In the demo, RunFullscreenMovie is much simpler, without this logic. In the Demo, the BOOTTWO function, which is part of the game scripts' startup sequence. In this function, there's a call to a function called StartMovie this function begins playing the intro movie. So the demo doesn't use RunFullscreenMovie at all! We can confirm that there is a movie named intro by checking in the movies directory, so we are sure that this is the code that starts the demo and plays the movie.

So, how did it work in the original interpreter in Windows and what can we do to fix it in ResidualVM? We'll keep digging in the next blog post!

Sunday, March 23, 2014

Detour: Fixing a Segfault

When playing with the demo version of EMI (which you can get for free here from the ResidualVM project), I found that when you press F1, which usually brings you to a menu, the game pauses, but no menu appears. While this might be the intended behavior, pressing Esc now causes ResidualVM to crash. Let's explore this crash and figure out how to fix it!

First, we'll need to restart ResidualVM with a debugger. Start up ResidualVM with the command:
  • gdb ./residualvm
When the prompt comes up, type run to begin running ResidualVM. Start the game as usual, and then trigger the crash. When the crash occurs this time, in our window with gdb, we can see that the debugger has caught the error and stopped execution:
Investigating the ResidualVM Crash
We can see that there is an error from ResidualVM itself, warning us that there was a null value in the Lua engine from the 'gettable' tag method, and also the actual crash in ResidualVM which follows that.

It's helpful to see how we got here, and GDB let's us do that by checking the backtrace. Type bt to see the backtace, which is a list of the function calls that brought us to this location. Typing up or down moves the current position up or down in the stack, letting us look at different variables at each call. To print out a variable's value, use the print or p command followed by the variable name you're interested. It's also sometimes useful to see the code that preceded the error. Using the list or l command will print the code around the current position in the stack.

Finding the Error with GDB
After some exploration, we see that we dereferenced the g_registry variable when it was null, causing the segfault.

Now that we know what's causing the problem, let's look at the engine code for the registry functions and see if we can identify why this is breaking in the demo.

In engines/grim/grim.cpp (line 87) we see that the registry is initialized when the engine is started and the game type is ''GType_GRIM''. In engines/grim/detection.cpp (line 413) we see that the game type for the EMI Demo is ''GType_MONKEY4''. So, the registry is never created for the demo, and because of that, the g_registry variable is never set.

The registry for Grim Fandango holds settings and values that would normally be found in the Windows registry. As of now, there are no EMI specific registry options set up and the registry is specific to Grim Fandango. Although we'll probably need to implement something like this later on, for now, let's fix the segfault by preventing the code from dereferencing g_registry when it's null.

Searching through the code, we find that the only accesses to g_registry that don't check for a null value are in engines/grim/lua_v1.cpp in the functions:
  • Lua_V1::ReadRegistryValue
  • Lua_V1::WriteRegistryValue
  • Lua_V1::postRestoreHandle
Looking back at our backtrace, we see that indeed, the path that the code took went through the ReadRegistryValue function before segfaulting. Since
the code in postRestoreHandle that accesses g_registry is only used in games with the ''GType_GRIM'' tag, we can safely ignore this instance as those games will always have the g_registry variable set. With checks added to see if g_registry is defined before performing registry actions in the functions ReadRegistryValue and WriteRegistryValue, the bug is fixed and ResidualVM no longer crashes. I also added a warning to let anyone else know that there was an attempt to access a registry that didn't exist. To fix this bug properly, we should override these functions so that they point to the EMI_Registry instead. For now, we'll stick with just fixing the segfault.

So now, we have fixed the segfault, but the game is now stuck, preventing the demo from continuing further. In the next post, we'll discuss how to fix this issue.

Friday, March 21, 2014

Identifying Variables and Functions

Continued from the previous entry

In this post, we'll be focusing on variables, function calls and the structure of the decompiled function, SetActorLocalAlpha.

In the previous entry, we examined the Lua script that calls this function and identified its arguments. Applying that knowledge to the disassembled code indicates that the first 4 calls to lua_lua2C are actually calls to get the parameters for the function. As such, this code can be re-written with descriptive variable names. Additionally, the types suggested by these parameters suggest similarities with code that's already been written.

Starting with the first parameter, we see that in the script, the function is called by itself, with no colon or period operator. This indicates that the function is standalone and not a member of any class. Next, we see that the first parameter is self.hActor, as seen in the call to SetActorLocalAlpha in the previous post. Since we know that the variable is a member of the "self" object, we need to identify what this variable is used for. Often, it's possible to tell the object's type by looking at what sets the member variable. Searching through the scripts for "hActor =" will identify where the hActor variable was set, and we're in luck! The file _actors.lua is the only file in the scripts directory that matches this criteria. Let's take a look at where it's used.

The first hit we get is in the actorTemplate, a structure that is used in the Lua script as a template for all new actor objects. While this is useful for identifying members of the Actor class, this doesn't help with identifying the type for hActor. Let's move on to the next instance.

In this function, Actor.create, we see that the actorTemplate is copied into the variable local1. The variable later has the hActor member set by saving the return value of the function LoadActor. From the source for LoadActor found in engines/grim/lua_v1_actor.cpp (line 37), we can see that this function creates a new Actor, and therefore, the type for hActor is most likely an Actor. While this might seem a bit obvious, when things are less obvious, you'll still follow the same basic steps.

With the knowledge that this variable is an Actor object, we can improve our translated code by naming the variable that we've saved the 1st parameter actorObj. We can also continue through the code and simplify functions that use this a parameter. It is also a good idea to compare other uses of Actor objects to see if already rewritten code matches what has been found so far. In this case, we see a call to lua_isuserdata and lua_tag:
lua_userdata and lua_tag

The call to lua_isuserdata is checking that the 1st parameter contains an object with the type UserData. If the 1st parameter doesn't have a UserData object, the whole function will just return. In the second box, the call to lua_tag is comparing the UserData's tag with the number 52544341h. While this might seem like a random number, if we interpret this value as a string of four characters, they spell out 'RTCA', or 'ACTR' in little endian.

Before we continue, let's look at what we mean by Lua UserData and tags. In the ResidualVM wiki we see that in the modified version of Lua used in this engine, variables are saved in a pool and are identified by their pool id number and tagged with an identifier. The id number is used to retrieve the data from the pool, while the tag is used to identify the type of the object. Going back to engines/grim/lua_v1_actor.cpp, when the engine loads the Actor, it creates a new Actor object instance, then adds this instance into the pool. It is also applying the tag using a macro: MKTAG.

So, applying this information, we interpret these two lines of code as checking to see if the variable is UserData, and if so, does it have the tag 'ACTR'. If not, the code will return. This bit of assembly can now be converted into C++:
The C++ code, translated from assembly
To this point, we haven't really added anything to the project yet since this code had already been worked out by one of the previous developers. In the next post, we'll start working on filling in the missing parts of the code with the information that we've learned so far.

Thursday, March 20, 2014

Working on Understanding the Function from the Other Side

Continued from the previous entry

After our work from the previous post, we have a skeleton of the functionality provided by the Lua command we're working on. However, working from assembly isn't always the best approach. For EMI, we already have a whole lot of help from the excellent work done by the developers who previously worked on Grim Fandango, EMI and Myst3 support in ResidualVM! In this project, they have provided us with code and tools to inspect the scripts that are being run in the game. Let's inspect a script that called the function stub we're working on.

First, let's take a look at the game scripts. In EMI, the game scripts can be found in the file local.m4b. This file is actually a bundle of files, which can be extracted using the tool unlab, found in the residualvm-utils repository. Once you've built this utility, let's unpack it so we can get at the Lua scripts that make up the game. In the directory with the EMI data files:
  • mkdir local
  • cd local
  • unlab ../local.m4b
After running this command, inside the local directory there will be a large number of files, the most important for us now are the files that end in .lua. If you inspect these files, you'll see that they're not text, but a binary format. To make the scripts readable, we'll use the tool delua, also found in the residualvm-utils repository. To make this easier, we'll just decode all of the scripts in the directory at once so we can easily search through them:
  • mkdir scripts
  • for i in *.lua; do delua $i > scripts/$i; done
  • cd scripts
 With the scripts converted into a readable format, we can now search through them for instances where our function of interest is used:
  •  grep SetActorLocalAlpha *.lua
From this, we can see that when SetActorLocalAlpha is called, it's called with 4 arguments. These correspond with the 4 calls to lua_lua2C at the beginning of the disassembled function from the previous entry. Importantly, we also see that the first variable passed is arg1.hActor. This is useful information because it gives us context for the type of the variable and hints as to how its used in the code. We can also look at other Lua functions that  are called from SetActorLocalAlpha to find the types of the variables used.

Before we go back to SetActorLocalAlpha, let's examine the set_vertex_alpha_mode Lua function more closely. As you can see in the code listing above, there are a bunch of unnamed variables. Let's figure out what they mean so that we can better identify the arguments to SetActorLocalAlpha. Let's start with the function arguments.

In Lua, a method function can be called in two different ways, with a period between the class and the method or a colon. These methods differ in that the second transparently passes a reference to the object as the first argument. In most applications, we name this argument "self". In wed.lua, we see that the set_vertex_alpha_mode method is called using the colon operator. Because of this, we know that the first argument is really the "self" variable. Once we have identified a variable like this, we'll update the rest of the function to reflect this new knowledge. We'll continue this process, using cues from the calling function and the contents of the function to name the rest of the arguments and the local variables

With all of this additional information, we can now continue filling in the details for the SetActorLocalAlpha function.

Wednesday, March 19, 2014

Filling in a Stubbed Function

Continued from the previous entry

A Stubbed Function is one in which the function is present, but doesn't implement the full behavior required. In ResidualVM, the EMI engine has a number of stubbed functions which represent the unfinished Lua function calls. These functions print a warning to the console to show that they're actually used, and when they're used. In the previous post, I identified a function, SetActorLocalAlpha that was stubbed and located the code in the original binary that implements this function. In this post, we'll work through the assembly and create a patch implementing the missing functionality from this call.

Start of the SetActorLocalAlpha Function
After renaming the routine to SetActorLocalAlpha, we start at the beginning of the disassembled code. Here, we see a representation of the stack with each variable representing the local variables in this code. After these, the first real code from the function is present. In our first pass over the code, I usually start by examining any functions called from here. In this code, the first function we see called is:
?lua_lua2C@@YAIH@Z
While this looks a little confusing, this function call is encoded or mangled to ensure that it doesn't collide with any other named functions in this program. Mangling names allows for implementations with different calling parameters, such as the overloading functionality found in C++.

Dropping this name into a demangler gives us this result:
unsigned int __cdecl lua_lua2C(int)
Which is a whole lot more readable! We know that the EMI engine uses Lua to execute the game scripting, so looking into the Lua documentation for what this does will help us to understand what the decompiled function is intended to do. We can also look at the existing code for ResidualVM and its Lua implementation for information. With some research, we find that Lua maintains a stack of values or objects and this is how values are accessed from this stack.

As we go, we examine the structure of the branches and jumps (helpfully represented as blocks in IDA) to sketch out the shape of the code. I usually re-write the code in C as I work through it, so, after the first pass my code will contain the function calls, if/else statements and loops.
Code After the First Pass
The code at this point is really just pseudo-code, but we'll expand on it further in the next pass, in the next post.

Figuring Out What to Do

Continued from the previous entry

In this post, I'll show you what my work flow generally looks like for reverse engineering. In the past, I've worked on a few reverse engineering projects like the Broadcom BCM43xx and the Collie SD Card interface for Zaurus. In any reverse engineering project, the first thing to figure out is where to begin! For this post, I'll take a stubbed function from the ResidualVM code, explain how to find the original implementation, figure out what it does and then re-implement it.

After getting EMI running, I noticed that there were a lot of debug messages about stubbed functions in the console window.
Console window, filled with debug messages
Picking one at random, I decided to look into the function Lua_V2::SetActorLocalAlpha, but first, I needed to do a little bit of research!

Understanding what the structure of the program is before diving into the assembly is usually a good idea. From the documentation at the ResidualVM wiki, I saw that Lua was the scripting language running the game and that EMI's engine was structurally similar to the one used in Grim Fandango. Let's take a quick look at the source code for ResidualVM and investigate the structure some more.

In engines/grim/emi/lua_v2.h (line 39), we can see the list of Lua script functions that the engine provides. The actual code that implements these functions can be found in the rest of the files in engines/grim/emi/. Of note, the function we're interested in, SetActorLocalAlpha can be found in engines/grim/emi/lua_v2_actor.cpp at line 38. Helpfully, this code is partially completed, waiting to be finished!

So, to summarize what we have so far:
  • We have identified the function we'd like to work on
  • We have information about how the EMI engine was put together
  • We have found where the implementation will go once we've written it, and some helpful information from the stub function.
Next, we'll take a look at the binary from the original version of EMI. I'll be working with the patched version if you'd like to follow along.

I like working with IDA, it's a great reverse engineering tool! Luckily for poor students like me, a version of the tool is provided for free for non-commercial use. While all of the features of later versions would be nice, including a native Linux build, this will do for now. If you're not using windows, Wine can be used to run IDA with almost no issues.

To begin with, I first checked to see if the function name we were interested in was in the executable at all. Some binaries are stripped or obfuscated, making this job a lot harder.
  • strings Monkey4.exe  | grep SetActorLocalAlpha
This returned two instances:
SetActorLocalAlpha
SetActorLocalAlpha: Actor isn't wearing any primitives!
This looked promising! I put the Monkey4.exe binary into IDA and let it process the file. Once this was complete, I searched for the text SetActorLocalAlpha. Lucky for us, there's a jump table with the function name in ASCII, likely for the Lua scripting engine to convert the text into the actual function call. The entry for SetActorLocalAlpha is found at 0x004C06D8, and points to a function call at 0x00413570. We now have the entry point for the function we're interested in!
In the next post, we'll investigate what can be learned from the Lua scripts that actually call this function and how it can be used to improve our code.

Introduction, Unpacking and Moving in

Hi! My name is Joe Jezak and I've decided to start working on fixing the issues with Escape from Monkey Island (EMI) in ResidualVM.

Usually, when I start working on a new project, the first step is to see what happens when you run it! So, to get started, I first found my EMI discs:

The Discs were hanging out with some old friends!
To prepare the discs for use in ResidualVM, I copied the contents of the Monkey4 directory on both discs to my hard drive. As noted here, you must rename Textures/FullMonkeyMap.imt however, I found that both CD's files must be renamed, not just the file on disc 2. So, for disc 1, the file must be renamed to Textures/FullMonkeyMap1.imt and for disc 2, the file must be renamed to Textures/FullMonkeyMap2.imt. When copying the files, make sure that you copy the .m4b files from the MonkeyInstall directory as well! Finally, I found that the voiceAll.m4b file must be copied from disc 1, the copy from disc 2 causes an MD5 error. Also remember that you may need to copy the patch (if needed) and the data file from the Residual project into your EMI data directory.

Okay, so now, we have the game data files. Great! The next step is to fetch the ResidualVM source code and build it by following these directions. That wasn't too bad! It compiled cleanly on the first try with no issues. I started up the build and set it up by adding a game and pointing it at the location I stored the data files.

Setting Up the Game
Now, I crossed my fingers and started up the game and it worked! Kind of:
There's something missing here...
It was apparent that there is an issue with fading between images, resulting in weird output like this. A bug to add to the list! However, things actually work a whole lot better than this first impression would seem. I was able to complete the whole first act (Act 0?) without any game breaking bugs. There were plenty of issues, such as the hot coal floating around like mad and the wick on the cannon not burning away, but the game was playable. Great!
Guybrush Threepwood at his polygonal best
So, where to from here? Helpfully, the current build prints out a large number of debugging messages from the Lua interpreter pointing out places where there is missing code. I decided to pick one and see if I could figure out what needed to be in this function. But that's for the next entry, this one has gone on long enough.