Tuesday, August 19, 2014

The Raft Spin

After the rotation fixes, one weird issue remained with Guybrush, when getting on the raft in the set mot. When Guybrush got on the raft, he would strangely rotate in a circular motion once on the raft.

To debug the problem, I commented out the lines in the set file mot.lua regarding guybrush getting on the raft and found that it was a call to setrot which caused the issue to occur. In this call, setrot was called with TRUE as the fourth parameter, causing Guybrush to rotate over time instead of snapping to the rotation. Due to some difficulties with getting the rotation before Guybrush turned out of the retail version, I used a textObject in the lua script before setrot was called to display the values. Here's the retail version and ResidualVM:
Guybrush's rotation before getting on the raft in the Retail Version
Guybrush's rotation before getting on the raft in ResidualVM
As can be seen, the yaw parameter is inverted! So, is this the issue that's causing the weird rotation? I tried setting Guybrush's rotation before calling setrot(0, 0, 0, TRUE) with guybrush:setrot(180, 69.8599, 180) and found that the weird rotation still occurred. So, it seems there are actually two problems here:
  1. When rotating, Guybrush is sometimes rotated in axes that he shouldn't be when setrot is called without snapping to the new angles
  2. Guybrush's rotation is incorrect, with the yaw axis returning a negative value when it should be positive
To inspect the first problem, I tried setting Guybrush's rotation to 0 in each of the axes independently and inspecting the result and found that no axis was directly responsible for this rotation. I then tried setting them all to 0 with a snap and found that this looked correct (which was expected, obviously). The next experiment was to set each of the angles to a part of the rotation. I tried (0, 69.8599, 0) first and found that the result looked right, indicating that the pitch and roll axes might be causing the problem. I then tried (45, 69.8599, 45) and found that it also looked correct! However, at (90, 69.8599, 90), the issue re-appeared. In fact, any value below 90 seemed to work okay, while values greater than 90 caused problems. Out of curiosity, I then tried (-45, 69.8599, -45) and found that it also worked fine! In fact, values between -90 and 90 (non-inclusive) seemed to produce the correct result, while values outside that range produced incorrect behavior.

This pointed to a problem with the conversions between Euler angles and other rotation forms in that angles between (-90, 90) were calculated correctly, but larger angles were not. First, I checked the functions that I wrote by implementing a simple perl program to compute the rotation matrices from Euler Angles and back again. This program can be found here and produces results demonstrating the correct equations for setting a rotation matrix from Euler Angles and retrieving the Euler Angles from the matrix:
Output from computing the Euler -> Matrix conversion for ZYX
Fortunately (or unfortunately for finding the bug!) the results matched and there wasn't an obvious problem, aside from the absence of checking for singularities or gimbal lock checking. However, I was having trouble keeping the X's for the axis and the X's for the Euler Angles straight, so as a part of this work, I reworked the Euler Angle nomenclature to make the naming more consistent with the usage.

With the nomenclature straightened out, I then added a fix for singularities that arise from when it's impossible to determine the correct result. For an Euler Order of ZYX, we can see that when the Y Euler Angle is pi/2 (or 90 degrees), it will be impossible to differentiate the other two angles. We can see this by inspecting the figure above, where we can see that if the result of Cy is 0, then the components that are used by the arctan cannot be used to determine the result.  Cy is 0 when the cosine of the Y axis angle is pi/2 or -pi/2. This state results in a condition called gimbal lock. To work around this issue, we can check to see if the conversion will produce gimbal lock and if so, chose a single rotation.

So, does correcting for gimbal lock fix any of our problems? We can be fairly certain that it won't just by inspection. In the scenario above, Guybrush's coordinates would not cause a singularity because the second angle (yaw) is not +/- 90 degrees. Still, it was a necessary fix!

So, what is really going wrong here? Let's try some more experimentation! First, let's rotate Guybrush from (0,0,0) to (180, 0, 0) and (0,0,0) to (0, 0, 180):
Pitch 180 Degrees - (0,0,0) to (180, 0, 0)
Roll 180 Degrees - (0,0,0) to (0, 0, 180)

Rotation of (0,0,0) to (180, 0, 180)
We can see that the sum of these rotations is going to result in Guybrush facing 180 degrees opposite his position on the yaw axis from the (0,0,0) rotation. So, we can see that a rotation from (180, 0, 180) to (0, 0, 0) should only produce rotation in the yaw axis instead of rotating through the pitch and roll axes as well.

What causes this extra rotation? In ResidualVM, the turn methods of the Actor use Euler Angles to compute the rotation by making the angles equal. If instead, a Quaternion is used for rotation, this problem should be resolved. Now that the problem has been identified, in the next post, I'll address the issue.

Monday, August 18, 2014

What's up with Timmy?

Continued from the previous entry

In the previous entry, we started debugging the issues with the lava puzzle and found a problem with Guybrush colliding with the logs. We also found that a similar problem was occurring with the monkey Timmy, on the beach at Monkey Island. In this post, we'll try to figure out why Timmy isn't colliding with Guybrush and just running right through him instead.

Timmy on the Beach, Running Wild!
To begin with, I examined the script that controlled Timmy's running behavior. When an actor has a monkey that's running around the actor's position, a variable named monkey_range is set, which describes the maximum distance that the monkey can run from the actor in each direction. A random value within this range is chosen for the monkey to go to. The monkey runs to this location using the runto method, which is implemented by the Lua command WalkActorTo, which ultimately runs the Actor::walkTo method in ResidualVM. Within this method, the actor checks for collisions while plotting the path.

So, why isn't Timmy colliding with Guybrush? If we comment the check to print out when actors are checked for collision, we find that we can make Timmy collide with Guybrush if Guybrush walks into him, but not if Timmy runs into Guybrush when Guybrush isn't moving. If we inspect the logic that makes Timmy move, we find that there's a method in the Actor class called updateWalk which updates the character's position over the duration of the walk. If we add some code here to force a collision check when the actor is moving, Timmy stops as expected when he runs into Guybrush.

I suspect that this fixes the lava puzzle collisions too, but currently the logs in the puzzle aren't showing. I'll tackle this in the next post! This work is submitted as PR #1050.

TinyGL and the Font Segfault

With the integrations of my text patches, the text rendering is much closer to the original version. Unfortunately, it was found that they also caused memory corruption, resulting in segfaults when using the TinyGL rendering path.

First, I wasn't sure exactly what was causing the segfaults, just that they were happening at seemingly random intervals and that the backtrace usually involved a text function such as drawing or freeing a text object. I put the game into gdb and found that I couldn't reliably trigger the bug to investigate the problem. I then tried running ResidualVM in valgrind and found that when creating the text object with the function createTextObject in the TinyGL rendering path, when creating the text bitmap, there was a buffer overrun. To be sure, I added an assertion to this path that asserted when the bitmap offset was larger than the bitmap storage. This confirmed the issue!

So, what changed to cause the bitmap size to be smaller than expected? If we look at the code, it appears that the TinyGL engine is allocating the text object bitmap using the kerned height and width. As it turns out, this isn't actually enough space to hold the completed text because there's an additional piece of information, the column offset. When a letter is printed without using the whole kerned space (as in the character "1"), there's an offset added to the starting column to account for this, letting the game use less storage for the character. When I accounted for this extra width, along with adding a new function to take the y offset into consideration as well, the segfault was fixed and text was rendered properly in TinyGL again! This was submitted in PR #1024.

Tuesday, August 12, 2014

Simplifying Attaching and Detaching

Continued from the previous entry.

After all of the previous work on attaching and detaching and getting it pretty close to working, I found that there were still a few issues, like the cuffs not attaching properly to Guybrush's wrists in the set wed. Additionally, after a code review by Akz, he suggested that the code could be made simpler and that instead of trying to match the internal matrices in the retail version, I should just aim to match the output. Of note was that I needed to insert a large number of transposes to make the intermediate values work.

As such, I scrapped the previous code and instead tried to fix the problem of attaching and detaching with all of the new information that I'd learned over the course of debugging this problem. This new version of the attach and detach code is much less intrusive and achieves the same result with far fewer calculations. This code was re-submitted as PR #948. So, how does the pole in the set pph look now?

The set pph after getting off the raft
Looks pretty close, the pole is in the right position, but let's compare it with the original:
Difference between retail and ResidualVM
It's a little hard to see, but there's still a different in the orientation of the pole. I'm not exactly sure where this comes from, but I suspect that it has to do with the joint rotation of Guybrush's hand. Still, the result is much better than it was before.

The next outstanding rotation issue that needed to be looked into was Issue #958. I thought that I had fixed this bug, but it appears to still be broken. The best item for displaying the issue is the itty-bitty brass screw, which is rotating on the wrong axis.
Retail
ResidualVM
As you can see, the screw starts in the wrong orientation and is upside down! Additionally, the rotation of some items in the inventory are incorrect with rotation being applied to the wrong axis. As it turns out, the orientation was incorrect because the rotation was applied backwards. Inverting the rotation Quaternion result of GetRotationQuat made this problem go away, but the resulting angle was wrong for the attached actor. This was because GetRotationQuat was being used as part of the attach and detach sequence. Iverting those usages as well found that the result now matched what we had before, except that the screw was rotated correctly.

The pole in the set pph is effected by the rotation of the hand joint in Guybrush's hand. As it turns out, the code that I had written for the attach and detach had neglected the position of the attached joint. Incorporating this rotation and translation slightly improved the position and rotation of the pole in the set mot.

At this point, I was hoping that all of the issues were resolved enough to move on, but two major issues remain:
  1. The boats in the lava puzzle were placed incorrectly, appearing (well, not actually appearing) under the lava instead of on top.
  2. When Guybrush rotates on the raft after boarding, he spins on a wrong axis.
In the next post, we'll tackle these issues!

Monday, August 4, 2014

Working on the Remaining Issues for Text Rendering

Continued from the previous entry.

In the previous entry, we (mostly) fixed all of the text problems in the set wed, but discovered that there were issues in the set gmi. Since both Elaine and Guybrush's text were in the wrong place, I decided to make the game draw her bounding box too. Here's a typical scene with some text:

Bounding boxes drawn for Elaine and Guybrush
We can see here that the bounding box for Elaine is very wrong, while Guybrush's looks pretty good. So, there appear to be two more issues related to text positioning here:
  1. The bounding boxes are still wrong
  2. The text isn't being placed in the right position above the bounding box
We'll investigate each issue, starting with the bounding boxes. First, how are the bounding boxes calculated? In the graphics driver, there is a function that iterates through the transformed and projected vertices in each component of a costume and finds the bounding box coordinates that enclose these points. To debug this issue, I stepped through each costume component and tried using that as the bounding box coordinates. It turns out that using only the active costume as specified by the wearChore for the dimensions of the bounding box makes the bounding box fit the actor. The result of this change (in PR #976) is shown below:

With the new Bounding Box code
I then decided that the reposition function seemed like a hack, so I examined the retail version in a debugger and found that a simple manipulation of the X and Y coordinates produced just about the same result. With this function removed and the code simplified, here's the current output difference for this text in the set gmi:

Pretty close!
We're very close now with the text placement in this scene, with the text still being placed one pixel too low, but otherwise perfect! Let's check out some other, previously problematic scenes, starting with the set mel, when you meet Carla and Otis again for the first time:

Retail
ResidualVM
As you can see, the text placement for both characters is a little bit off. After some inspection, I found that Carla's text is placed using a specified X and Y coordinate passed as a parameter to SayLine. This text is placed by ResidualVM 9 pixels too high, as compared to the retail version. It turns out that in ResidualVM, there was code that shifted the text when it was too close to the edge, which was using a constant value for shift amount. Replacing this constant with the text height puts Carla's text 1 pixel too low, like most of the other text so far, but very close to the right position.

For Otis' text, the positioning is much further off. His text is placed 41 pixels too low and 17 pixels to the right. After some work with the debugger, I found that the Retail version of EMI was placing Otis' "Yeeps!" at the position: (394, 25). To check that this was accurate, I checked the screen  shot above and found that location, and measured to make sure that the output made sense with these numbers:

Yeeps!
So, the location matches, with the red dot in the middle of the second "e" representing the point specified by the retail version (394, 25). We can see that this dot is 25px from the top and 394 pixels to the right. We can also see that the text is centered in the X axis around this point. ResidualVM does center the text properly, but the point selected is (411, 66). So, where does the difference come from? I continued debugging until I found the bounding box information for Otis and added that to the image below, along with the "Yeeps!" text from the retail version:

Retail bounding box in red, ResidualVM in blue (center line drawn as well)



So, my previous fixes for the bounding box above didn't match the retail version, and we can see that the retail text is placed centered at the top of the retail bounding box. The retail version calculates a bounding box at point (394, 25) with a size of (254, 365).

After some discussion with Akz and Klusark on IRC and some test code, Akz saw that the retail version's bounding box could be computed from the coordinates provided as part of the costume data. Using the bounding box position as the center and projecting the points of the bounding box into screen coordinates resulted in a bounding box that matched the retail version. With this change, the text is now rendering in the right place, although it's still 1 pixel lower than it should be. Here's the difference in the scene again with Otis and Carla:

The text is pretty close now!
Another issue arose when Klusark tested the set tri. In this set, the text was layered on top of other text, resulting in a mess! This was actually the original bug that inspired all of the text refactoring in the first place, so I probably should have tested this sooner. Why wasn't this text fixed by these changes? It turns out that the text is placed based on the position of the ship. So, when the ship moves, the text should be placed relative to the ship's location. To find the ship's location, a function called WorldToScreen is called to find the screen coordinates for the ship. The text is then rendered using these coordinates. In the retail version, instead of passing a screen coordinate (0,0 -> 640,480), it instead passes a floating point value in the range -1 to 1. Removing the conversion from the function and adding it as needed throughout the graphics primitive drawing code fixes the issue, although the result isn't exactly right, with the text being drawn above the ship rather than below. Still, it's quite a bit better than it was!

Working text on the Tri-Island Map
This work can be found in the updated PR #964.

Tuesday, July 29, 2014

Endianness Fixes

I had some time this weekend where I wasn't at home able to work on ResidualVM. This provided an excellent opportunity to take a look at some of the issues that ResidualVM has on big-endian architectures like my PowerBook G4. I'm currently running Gentoo Linux on my Powerbook, and was able to play other OpenGL games without issue and Grim appears to work fairly well in ResidualVM. However, EMI is a mess!

The LucasArts intro has the wrong colors!
Even worse, the set wed has incorrect colors and alpha channel issues!
As can be seen above, the colors for many objects appear to be swapped from the intended colors. Let's investigate these issues, starting with the intro screen.

When debugging graphics issues like the intro screen, I often examine the colors between the working, little-endian version big-endian version to see if there the colors are easily identified as being swapped. Unfortunately, this wasn't immediately obvious as being the case. The colors were completely different:
No direct byte swap will make the right turn into the left...
However, I then checked and found that when loading bitmaps, two formats are present in ResidualVM, a packed 1555 format and a 32 bit format. With this information, it seems that the data was specifying an alpha channel, which was modifying the color components that were swapped. After some experimentation, I found that the native endianness is the preferred byte order for the Escape from Monkey Island binary format containing a bitmap image, a .til file. This worked well for 32-bit .til files, as found in the Windows version of EMI. However, the PS2 version uses 16-bit .til files. I found that it was easiest to convert the 16-bit input to a 32-bit output and doing the bit manipulation when the image was loaded.

With the .til files fixed and the intro displaying the correct colors, let's re-examine the set wed, to see if that was effected by the change:
That's a whole lot better!
While the output is now a whole lot closer, there's still a bunch of issues to be addressed. Let's focus on the background first. In the background, the enemy pirate ship is bouncing in the waves. This effect is actually achieved by an Actor object rotating and moving up and down on a timer. The Actor is drawn with a Sprite object, which are drawn using an image from a .tga file. After some inspection, I found that the desired behavior is to again read from the file in the native format. Another tweak was required to change the display format from BGR to RGB and this problem was solved. Here's the output now:
Getting closer!
Of note, this change also fixed the backgrounds for dialog boxes and other components that use .tga files. The next most obvious issue is that Guybrush and the brazier are both rendered as completely black with no color for the textures. I noticed that in the Act 1 title card, the actors were actually visible, although dim. This gave me the hunch that the lighting was to blame for the dark actors. To test this, I tried commenting out the lighting code when drawing the actor and found that the texture was drawn mostly correctly, albeit without lighting:
With lighting commented out
I then inspected the values used to dim Guybrush and found that the in the lighting object, the x, y, and z values were all NaN, causing the output to be black. As most endian issues appear when using data from the other endian systems, I looked at the loadBinary method for lighting and found that the floats used by the lighting were read in directly, rather than being interpreted by get_float() which performs the endian swap if needed. After adjusting both the Light and Shadow loadBinary methods, I got the correct output for the set wed as seen below:
Looks like it's all working correctly now!
The only outstanding issue right now on big-endian machines is that the 1555 ARGB color for the video frame isn't unpacked correctly. I suspect that an approach similar to how I fixed the 16 bit .tga files would work here as well, but I need to see if there's a better way to do this first. This work is in PR #974.

Tuesday, July 22, 2014

More on Text Positioning

Continued from the previous entry.

Let's take stock of the current state of the text positioning changes after the last blog post. First, let's look at the differences in the set wed when Guybrush says "There's an enemy pirate fighting over there.":

State of the text difference at the start of this post in the set wed
With the changes from before, we can see that there is less difference than when we started, but that the text for Guybrush's speech bubble is positioned to the right by 1 pixel and towards the bottom by 5 pixels. Likewise, the text for "Look at enemy pirate" is closer, but still not quite right. It appears to be 11 pixels too high, towards the top compared to the retail version.

Another place with a similar issue is the text at the top of the inventory. It appears that the text is drawn 12 pixels too high, resulting in this difference:
 Misplaced Inventory text
As an aside, the differences in object rotation in the inventory appear to be related to be related to the rotation and attachment work I've been blogging about. That work wasn't in this branch when I took the screen shot!

Anyway, there's definitely a problem with the placement of text on the Y axis, with a similar problem displayed in both sets. As it turns out, there's code that specifically moves the text in the Y direction by 12 pixels in ResidualVM. This code is found in textobject.cpp and appears to have been added with this commit. Commenting out this adjustment results in the text for the inventory being placed in the correct location and bringing the command line in the set wed within 1 pixel of the correct location. Let's take a new screen shot with this change:
Current text difference at this point
As can be seen, the action sentence at the bottom is very nearly correct and only off by one pixel. However, the speech bubble got worse. Now, the ResidualVM text is still 1 pixel to the right, but 17 pixels lower than it should be.

My first thought was that there was simply a blank line inserted below the text, but the font height for EMI was 26 pixels, not 17. I next checked the bounding boxes for the actors to see if they made sense by drawing a green line around Guybrush representing the bounding box whenever he spoke. Here's what the output of that was:
Guybrush's bounding box
However, in order to get the bounding box to draw properly, I had to subtract the Y components from 480. In order to make the code simpler, I removed the subtraction from the Y component when computing the bounding box from the bounding box calculation and re-ran the test. Here's the result after modifying the bounding box code:
The green line is the bounding box for Guybrush's actor
Although the bounding box looks a little worse (see: Guybrush's feet and hair), I checked the text position with this new bounding box and found that it was off by 26 pixels, which if you remember, was the height of the font. Subtracting 1 from the number of lines that the text was pushed up by gets the fonts for this scene rendered very closely to the original:
Very close now!
We can see that the fonts are still off by 1, but for now, this is close enough! So, let's check to see if there are problems in other scenes. Unfortunately, there are:
The text is in the wrong place again!
Here, you can see that Guybrush's speech is rendered at the wrong place in the set gmi.

As a check, I also tried this branch with Grim Fandango and found that it too had fonts drawn too low, with text 13 pixels too low and offset by 2 pixels to the right for the top line and 1 for the bottom line:

Grim also has differences in the font location
As an aside, bgK also tested this with the French version of Grim, which was reported to be problematic with using the kerned font sizes. I'm happy to say that the crashing issues from that appear to be resolved!

In the next blog post, we'll take a look at these two issues and see if we can resolve them.

Monday, July 21, 2014

Making the Correct Rotations Work

Continued from the previous entry.

In the previous posts, I've managed to get the relative position and rotation values to match the ones from the retail version of EMI. In this post, I'm going to discuss how to present these rotations on the screen, fixing the messed up image we left off with in the last post:
Where we left off...
Let's start by taking a look at the apitrace output of this scene again. I started by running the retail version of the game with the options "-w -gl" to start in a windowed mode using OpenGL. After capturing the apitrace of jumping ahead in the script and moving to the set pph using the jump target "160",  I then loaded the trace into qapitrace. First, I had the trace build thumbnails of all of the frames so that I could identify the first frame that drew the scene pictured above. With the scene identified, I then looked through the data captured for this frame to inspect the rotation matrices that were responsible for rotating and placing the rendered objects.

I decided to focus on the trap first because it is a constant object in the scene and isn't moving. To find the part of the trace responsible for drawing the trap, I browsed through the API calls, checking the Surfaces tab to see if the trap had been rendered yet:
APITrace output, examining the drawing of the trap
With this section of trace found, I could then inspect the OpenGL state (in the Parameters tab) when this was rendered, providing us with the transforms used to place the object in the scene. With these transforms in hand, I then proceeded back to the ResidualVM code and checked the matrices against the ones I found in the trace. From this analysis, it appears that the Projection Matrix is correct, but the ModelView Matrix is incorrect. Let's take a look at what ResidualVM is doing to arrive at the ModelView Matrix for the model of the trap.

First, I inspected the value of the ModelView matrix at the start of drawing the trap and found that it was set to an identity matrix. This was then multiplied by another matrix using glMultMatrixf, then translated using glTranslatef, then multiplied once more, again with glMultMatrixf. In the retail version, the only command run is to load the matrix using glLoadMatrix, indicating that the matrix was probably prepared ahead of time or done in software. That said, with the correct result for the ModelView matrix, we can go back through the steps taken in ResidualVM and correct the problem.

So, where in the code do we find this sequence for setting the ModelView? In the OpenGL path (engines/grim/gfx_opengl.cpp), there's a function called startActorDraw which prepares the ModelView matrix for drawing the object. In the non-overworld case, it multiplies the current camera rotation, then translates with the camera position. Then, it takes the Actor's final matrix, transposes it and multiplies the ModelView by the result. There are a few likely possibilities for where this implementation is wrong:
  1. The camera rotation is wrong because we changed the Euler Order for the actor as part of this fix
  2. The actor's final rotation matrix is incorrect
Since I suspect that the final rotation matrix is correct because of the work we've done up to now, I'm going to first start with checking into the camera rotation. To begin, where does the camera rotation come from? In the same file, there's two functions that are responsible for setting up the camera, positionCamera and setupCamera. When the set is loaded, there are a number of parameters that define the camera for each setup in the set:
  • position - The point position of the camera
  • interest - The point that the camera is pointed at
  • roll - The roll of the camera
  • fov - The field of view of the camera
  • nclip - The near clipping plane of the viewing frustum
  • fclip - The far clipping plane of the viewing frustum
However, after some investigation, I found that the interest and roll parameters are really just quaternion components, unlike in Grim. The existing ResidualVM code was using them as Quaternion  components, but they were using the original Grim Fandango names for them.

Also, when creating a set with these parameters, the retail version of EMI creates an actor object that's kept as part of the setup to represent the camera. This allows it to keep track of the position and rotation information in the same way as the actors in the set.

Following the usage of these parameters in ResidualVM to the function setupCamera, I compared the calculated frustum to the apitrace results and found that ResidualVM was computing this correctly. So the issue was most likely related to the positionCamera function. So, how can we fix it?

I started by examining the retail version again in IDA Pro and found that there are a number of camera functions to retrieve the camera rotation and position. Unfortunately, these weren't yet functional in ResidualVM, so I added them as a part of this work. These functions are:
  • GetCameraYaw
  • GetCameraRoll
  • GetCameraPitch
  • PitchCamera
  • RollCamera
  • YawCamera
To implement this functionality, I decided to keep the rotation of the camera internally as a matrix, and calculate the Euler Angles as needed. When writing out the value for saved games, I use the Quaternion representation of this rotation so that it's compatible with the existing version. Why did I choose to keep the rotation as a matrix? In all of the graphical code that uses the camera rotation, it previously had to convert between a quaternion and a matrix each time the rotation was used. By keeping the rotation as a matrix, we avoid that overhead for the common case and only convert when reading or writing the setup rotation values for saved games. After checking these functions against the retail version, we see that the camera rotation values are placed and rotated properly.

Still, at this point, after all of this work, the graphics themselves haven't been fixed. We haven't change any of the output since the beginning of this post, just the representation in the code. So, how are the rotation matrices computed for display? First, here's the desired ModelView matrix for the trap, extracted with apitrace:

Crawdad Trap: Retail ModelView Matrix
-0.516999 -0.234287 0.823299 0.0
-0.0168343 0.964411 0.263873 0.0
0.85582 -0.122562 0.502544 0.0
-2.60927 -1.20781 -19.2888 1.0

What can we see just from this inspection? Well, we can see that there's a 3x3 rotation matrix in the upper right corner, and a 1x3 position vector in the last row. Let's compare this with the matrix from ResidualVM:

Crawdad Trap: ResidualVM ModelView Matrix
-0.542569 0.212863 -0.812594 0.0
-0.016834 0.964411 0.263873 0.0
-0.839843 -0.156848 0.519676 0.0
-2.60927 -1.20781 -19.2888 1.0

First, we can see that the ResidualVM position agrees with the position from the Retail version of EMI. We can also see that the middle row of the rotation matrix matches, but the others are slightly off. What might be causing this problem? Well, if we inspect the rotation for the trap, we can see that only the Yaw parameter is rotated, which corresponds to a matrix where the middle row is an identity. This explains why the middle row matches! Now, how about the other rows?

Let's start by going back to startActorDraw in gfx_opengl.cpp. I first checked to see what the current value of the ModelView matrix was before this function acted on it and found that the matrix was an identity, but the 3rd row was multiplied by -1. This was caused by the previous call to glScalef(1.0, 1.0, -1.0). I also checked the stored rotation for the trap's actor and compared that with ResidualVM and found that both ResidualVM and the retail version had the same values. With this information in hand, I then tried changing the order of the operations, as a rotation followed by a translation might not have the same result as the opposite operation. In the end, performing the rotation first, then computing the translation resulted in the correct rotation result. Here's the output now:
Looks pretty good?
Awesome! It looks like it's working properly again! How about the pole after detaching? What are the differences remaining in this scene?
I knew something was still missing! :(

At first glance, it looks fine, but a glaring difference remains: the pole! It's completely missing in ResidualVM. It appears that there's still some users of getFinalMatrix which seems to be producing an incorrect result. In the next post, we'll dig into those and see if we can fix those rotations as well.

As an aside, one interesting observation that can be made from the apitrace results is that ResidualVM makes a lot more OpenGL calls to produce the same screen. It appears that in the original version, instead of always drawing each vertex as ResidualVM does, the game instead sometimes passes a pointer to a list of vertices and uses the following sequence to draw the object:

     glEnable(GL_CULL_FACE)
     glFrontFace(GL_CW)
     glBindTexture
     glTexCoordPointer
     glColorPointer
     glVertexPointer
     glDrawElements 

This indicates that the original engine passes these coordinates in a list rather than individually. It would probably be a good idea to make the same optimization in ResidualVM as well because as of right now, the original takes ~5k calls to draw the set pph, while ResidualVM takes ~58k. Maybe a later project? We'll see!

Tuesday, July 15, 2014

Positioning textObjects

There are a lot of differences with the positioning of textObjects between the retail version of EMI and ResdiualVM's implementation. Let's take the scene wed and compare the text positions. First, here's the scene in the retail version: 
Retail screen shot from the set wed
And here's the same scene in ResidualVM:
ResidualVM screen shot from the set wed
And finally, here's the difference between the two:
Difference between the two images
We can ignore the ship in the background, since it's moving, and I didn't capture it in exactly the same place. We also can ignore the shadows, Akz has already addressed that problem. That leaves us with the text errors, exactly what we're trying to work out!

While it's obvious to the eye that the text "There's an enemy pirate over there." is in the wrong place and has a different wrapping point, there's also a mismatch between where the text is placed for "Look at enemy pirate".

Additionally, in some cut scenes, with subtitles on, the text is placed in the wrong location. A good example of this problem is the scene during travel to Jambalaya Island, where the exclamations of the crew are drawn right on top of each other! Another common issue is that the text is placed in the wrong place, obscuring the game play. A good example of this is when reading the directions to get to Pegnose Pete's house.

In ResidualVM, text elements are drawn using a textObject. In the Lua scripts, textObjects are created by calling the Lua/C function MakeTextObject. The parameters for the textObject are the string that's going to be displayed and a parameters table, with various options for configuring the text. We can remove this textObject from the screen by calling KillTextObject and passing the handle that was returned by MakeTextObject.  Let's try drawing some text with the retail version of EMI and observe the behavior. Let's start by just writing some text to the 0, 0 coordinate position:

textObj = MakeTextObject("Hello Blog Readers!", { x = 0.0, y = 0.0, font = verb_font, fgcolor = White })

Here's the output of running that command:
There's the text!
An interesting aspect to the positioning of the text is that it appears to be using a coordinate system with a basis in the center of the screen. Next, lets try exploring the extents of the text placement. Do negative values work? (Yes!) What is the highest and lowest X and Y value for placing text that still appears on the screen? (It appears that -1, -1 is the lower left corner and 1,1 is the upper right corner). Is this mechanism the only way text is added to the screen? (Nope, there's also SayLine, PrintLine and other methods, so when debugging, the method used will be important).

With this information in hand, I started by rebasing Botje's changes (to allow the X and Y values to be passed as floats) against the current master. These changes, made the text appear more correct for scenes where the X and Y values are passed from the Lua, such as the directions to Pegnose Pete's House, but this change didn't effect most of the erroneous text placement.

I then inspected the SayLine Lua/C function and found that it sets a variable when X and Y aren't passed in from the Lua script, which has the actor set the position relative to the actor's bounding box. Simply using the other side of the bounding box for the Y position moves the actor's speech to a much more correct location:
After the bounding box side switch
The next task I tackled was the incorrect word wrapping, as seen in the speech bubble, "There's an enemy pirate fighting over there.". Although the ResidualVM version actually looks a little bit better than the retail version here, there are other scenes where the word wrapping is worse, so let's try to fix it!

The word wrapping is done in the function setupText, which is found in the textobject.cpp file. Inside this function, the line width is computed by adding each character width in the line together. Strangely, this uses the maximum value between the character width and something called the dataWidth. Without a comment, it's hard to tell what dataWidth actually means, so, I first started by identifying when this code was added to ResidualVM. This was a harder task than anticipated because the file was moved twice over the course of development. To trace its history, I first reset git to the first commit that moved the file and then, pushed the HEAD of the repository back one more commit. This restored the original version of the moved file and let me explore its history further. After one more file move, I found that this was the commit that added the dataWidth field. Unfortunately, there wasn't much information in the commit message either, so I decided to try to test some strings in the retail version and measure what the width of the displayed text in pixels.

So, it appears that the dataWidth is the actual width of the character, but the width includes adjustments to the spacing (or kerning) between the letters to make them look nice. As an example, let's look at a simple string, "Wa":

     MakeTextObject("Wa", { x = 0.0, y = 0.0, fgcolor = White))

Which results in:
Demonstrating the font kerning
As can be seen, the letter "a" actually overlaps with the letter W by two pixels. It appears that ResidualVM gets this right and does use the correct values for drawing the characters as this matches perfectly with the retail version. So, our issue is with determining when to word wrap, which depends on the width of the rendered text. Let's examine the width variables to figure out what combination results in the correct width being calculated.

For each font character, there's the dataWidth and width sizes, which of these is the actual character width and which is the kerned width? Here's the dataWidth and width values for "W" and "a":

Value       W     
width 18
dataWidth 21
Value a
width 7
dataWidth 9

So, it appears that the "width" value is the kerned value, while dataWidth is the actual size of the font bitmap image. Checking this against the image above shows that the dataWidth values match the pixel size of the letters exactly. So, if we add the dataWidth values together, our characters will be wider than what's actually displayed to the screen. We can fix this by always using the width value instead of the maximum of the two as is done now.

So, let's take a look at the result difference after this change, when Guybrush is speaking the line "There's an enemy pirate fighting over there.":
Getting closer!
We can see now that the lines are the same length now, and appear to be in the correct location on the X axis, but shifted in the Y axis. So, first, is there a common shift between all of these lines? Or in other words, are the lines shifted by a constant value? Unfortunately, it appears that this isn't the case. The top line of Guybrush's speech is 1 pixel too far to the right and 7 pixels too far towards the bottom of the screen, while the bottom line is offset by 1 pixel to the right and 13 pixels too far towards the bottom of the screen. The text "Look at enemy pirate" is also offset by 14 pixels too high towards the top of the screen.

Let's look into the difference in line spacing between the two lines for Guybrush's speech. It turns out that in ResidualVM, the height of the font is modified when loaded if the character is taller than the specified font height. As before, this is intentional because of kerning. We do want some of the characters to sink lower or rise higher than the rest because it's pleasing to the eye. Simply commenting the section of code that modifies the height results in the correct behavior, so I looked up why that work around was inserted in the first place, which led to this commit. So, it appears that there was a crash bug associated with the height. I suspect that the problem would be fixed if the renderer uses the dataHeight instead of the height to draw the text.

As this blog post is getting long and I need to check with the other developers about this code, we'll wrap it up here for now. I'll continue in the next blog post with fixing the remaining font issues. All of the changes in this post can be found in PR #964.

Monday, July 14, 2014

Fixing the Position

Continued from the previous entry.

So now, let's return to position! First, let's examine the output of each of the actors for getpos(), getworldpos() and getrot() in the retail version of EMI, after running the rotation script:

actor1 Output
getpos (1, 4, 5)
getworldpos (1, 4, 5)
getrot (5, 15, 25)
actor2 Output
getpos (0.854459, -6.7806, -5.41233)
getworldpos (3, -1, -2)
getrot (3.98862, 13.1276, 7.01816)
actor3 Output
getpos (-6.7911, 1.63313, -0.462462)
getworldpos (-3, -4, 2)
getrot (6.82211, -9.04857, -16.7055)

After a lot of debugging with IDA Pro, I discovered that the retail version actually maintains the computed or provided values for the Euler Angles, the same rotation as a Quaternion and a 3x3 Rotation Matrix. Additionally, it keeps the relative position (to the attached actor) and the world position. Instead of doing this in ResidualVM, these values are computed when needed.

It's important to note that there are functions in the retail version which keep these values synchronized, so the matrix, quaternion, or Euler Angle that you're inspecting might not be up to date. So, in the retail version, the status of each of the different rotation representations are stored in a bitfield which should be checked before assuming that a rotation value is correct. With those caveats, I then inspected the three actors' Quaternion and Matrix representations after attachment and compared them to the internal values in ResidualVM. For example, here is the 3x3 Rotation Matrix representation of the rotation of Actor 2 after attachment, taken from the retail version of EMI:

actor2: Retail Rotation Matrix
0.7370587 0.49632958 -0.45869532
-0.47202796 0.86379832 0.17618749
0.48366731 0.086646488 0.87095153

So, how did I fix the rotation? After each change to the rotation, I added breakpoints to check the Rotation Matrix value and changed ResidualVM to match these intermediate values. One major structural change that I made was to separate the generation of a rotation matrix from the generation of the final matrix (including the actor's position for translation). As these operations must be applied in the same order, it was better to do them separately.

Now, with a working rotation implementation, we can finally fix the position. As discussed before, getworldpos() should reflect the original location of the actor before attachment. To find the world position, we take the relative positions of each of the attached actors and combine them to find the actual position coordinates.

Finally, what are the position coordinates supposed to represent? It's the actor's relative position to the parent actor when attached, or the global position, when unattached. With correct rotation, the previous math that implemented the position actually matches with the retail version and no further work was required!

In addition to attaching, detaching doesn't work quite right in the current code either. When detached, in ResidualVM, an actor simply reverts to a rotation of (0,0,0) and doesn't have the correct position, instead of using the same position and rotation, but in a global basis. We can fix the rotation problem by simply getting the rotation quaternion for this actor, which already sums up all of the combined rotation of all of the actors and retrieving the Euler Angles from this quaternion. This fixes the problem and matches the retail version!

All of the the above changes can be found in the updated PR #948.

To test this implementation, klusark created a lua script that iterates over a bunch of angles and writes out the result to a file. I added the script to the local.m4b file and ran it on both the retail version and ResidualVM, then compared the two with another script that checks one against the other and ignores rounding errors. While there are some rounding differences, the output now appears to be correct!

So, in the end, after all of this work, does everything look correct? Was the location of the pole in mot fixed?

Nope, not rendering correctly still! And we broke the other actors!
Unfortunately, no. We still need to correct the rendering position of the actors now to match the changes that we made to the position and rotation of the actor. We'll tackle this in the next blog post.

Sunday, July 6, 2014

Rotation, Position and Debugging

I've been having a hard time trying to reconcile approaches that seem mathematically correct with the actual output from the Retail version of EMI. To figure out exactly what was going wrong, I decided to go back to the debugger and probe the internal values generated by the Retail version of EMI to compare with the results that are being produced by ResidualVM.

To begin with, I needed to figure out how the Retail version maintained the actor's position. To find this out, I investigated the Lua entry point for SetActorRot to find where in memory the rotation information was stored. After some inspection, it turns out that in EMI, the rotation information is converted from the Euler Angles that were passed in from the Lua to a rotation matrix located 0xA0 bytes from the beginning of the allocated Actor. How do we know where the start of the allocated Actor is? For C++, the X86 ABI uses the convention that the object being referenced is passed in the ECX register when calling an object method. We can be fairly certain that in this case, it is the Actor object because the Lua/C++ interface passes in the actor handle as the first parameter, and when that actor's handle is used to retrieve the Actor object, the return value is saved into the ECX register.

How did I know that the rotation was stored in a rotation matrix? I observed that after the Euler Angles were passed in, a series of floating point operations occurred, resulting in writing to nine values in the Actor object. Writing nine values was likely to be a 3x3 matrix, so I checked these values for Guybrush in the set wed by setting the debugger to break when SetActorRot is called and the actor handle matched the integer stored in hActor for Guybrush, which seems to always be 4.  To be sure that this was Guybrush, I also checked the actor's position at offset 0x94, which matched with the results from guybrush:getpos().

Here's the rotation matrix for Guybrush in the scene wed, before any movement, in the Retail version:

Guybrush Rotation Matrix
-0.34202039 0.0 -0.93969256
0.0 1.0 0.0
0.93969256 0.0 -0.3402039

This rotation matrix was created by setrot(0, 110, 0). So, I then tested and found that there were multiple Euler Angle Orders that can reproduce this matrix. This is because there's only one rotation axis that's being changed, so there is ambiguity between the other two.

So, lets investigate the more complex attaching and detaching scenario that I outlined in the previous post. We'll start with only the first actor, and check that rotation matrix first. First, I needed to identify the actor handle ID, so I ran the rotation script and checked the value of hActor for at1, at2 and at3. The actor handles for each, after running the script for the first time in the set wed, are: 106, 107 and 108, respectively. Recall that the first actor is rotated with the call setrot(5, 15, 25). This results in this matrix in the Retail version:

Actor1 Rotation Matrix
0.87542605 0.40821794 -0.25881907
-0.40056604 0.9123922 0.084185995
0.27051076 0.029975513 0.96225017

In this case, there should be no ambiguity, with only one rotation that produces the correct result because each of the rotations are different. The rotation that produces the correct result in ResidualVM is:

     Math::Matrix4 m;
     m.buildFromXYZ(getRoll(), getYaw(), getPitch(), Math::EO_ZYX);
     m.transpose();

With this approach now reproducing the same results, in the next post, we'll examine attaching again with the extra information we've learned here.