Development Journey on Game Decompilation Using AI — Part 2
Let's go deeper by decompiling a whole module using AI
In the first chapter, I described my work on decompiling Marun and an attempt for adding AI support on decomp.me. In this chapter, we’ll dive deeper into decompiling a full module from Sonic Advance 3 using AI. But before we begin, I want to provide more context about this journey.
The first chapter's work ran from December 2024 to February 2025. After that, I spent two months traveling and had little time for decompilation.
During that break, I realized I hadn’t communicated my work effectively. AI performed impressively while decompiling Marun, and I wanted to share those results. But others got poor results using AI on modern games or with poor prompts, leading to frustration. I had unintentionally overhyped the AI’s role.
Meanwhile, AI advanced rapidly. The Model Context Protocol (MCP) became a de facto standard after OpenAI's adoption, and Claude Sonnet 3.7 introduced powerful hybrid reasoning.
Now that I’m back, I’ve decided to:
Start writing and streaming to document my journey, share reproducible methods, and get more people into decompilation, because it’s genuinely fun.
And of course, continue decompiling Sonic Advance 3 using the latest AI tools!
Let's decompile a new enemy module using AI
In the first chapter, I gave a high-level overview of the Marun decompilation. Now, let’s dive in and decompile Kyacchaa together, step by step.
Conveniently, the first function in the enemy module initializes its structure. That makes it a great starting point, since understanding the struct will be useful to decompile the next functions.
So, let's start!
Decompiling the create entity function
After some trial and error, I came up with a prompt following this pattern to decompile this first function:
You are decompiling an assembly in ARM7 of a Gameboy Advance game.
You know that this assembly:
```
{{ assembly code from CreateEntity_Marun }}
```
Translates to this C code:
```
{{ C code from CreateEntity_Marun }}
```
The above function uses the following struct:
```
{{ C code for struct Marun }}
```
Given the above context, translate this assembly to an equivalent C code:
```
{{ assembly code from CreateEntity_Kyacchaa }}
```
In addition to the function, also creates a new struct, called Kyacchaa,
similar to Marun, that will be used for the new C function you are writing.
I used Marun as the example because I worked on it before. No brain on it.
Then, I ran the prompt through DeepSeek R1, ChatGPT, and Claude 3.7 Sonnet, all on their free tiers.
At this moment, I had forgotten how slow DeepSeek R1 can be! It took a few minutes to generate the C code, and the result wasn’t as good as those from ChatGPT or Claude. In January, DeepSeek R1 felt like the state of the art. But now, at least for this task, ChatGPT and Claude seem to have pulled ahead.
I made two tests using ChatGPT:
In the first, I didn’t include the Marun struct in the prompt and made no mention of the Kyacchaa struct. After a few minor fixes (which I’ll cover soon), the generated code reached an 87% match. It's great!
In the second test, I used the full prompt, including the struct. Surprisingly, ChatGPT produced messier code with a worser match rate: 72%.
I also made these same two tests on Claude. The first one returned 92% match, and the second one returned 99% match! It's a fantastic result!
Bringing the code from a 99% to a 100% match was a straightforward task, and I've opened the PR. We’re officially kicking off the decompilation of this enemy!
Common issues
Now, let’s go over the common issues that occur when copying code from AI and pasting it into decomp.me. These are usually minor and include the following:
The C compiler we use only supports variable declarations at the beginning of a block, so I have to move them up.
The AI-generated code doesn’t include the definitions to to external functions, so I need to manually define those.
Occasionally, the code references non-existent properties within structures. It normally happens when the prompt is missing the type definition for this struct.
It sometimes miscalculates structure sizes, offsets, or array indices.
This is something that could potentially be addressed with a more detailed prompt or an agent. To address the first issue, I did try to ask in the prompt to declare variables only at the beginning of each block, but that added constraint seemed to be too demanding, and it ended up lowering the overall quality of the generated code.
Decompiling the second function
Next, I started to decompile the following function, sub_806599C
.
Matching Decompilation 101: While it’s not mandatory to decompile functions in the same order they appear in the assembly, following that order can be helpful: It allows us to test the matching rate against the entire project, since the function order affects the binary hash calculation.
I followed the same approach as before: I wrote a prompt using a function from Marun as an example and asked both Claude and ChatGPT to decompile it. However, since ChatGPT consistently produced lower-quality code compared to Claude, I stopped using it and focused solely on Claude.
For this function, Claude did a good job: its code got an 85% match after some small fixes.
This helped clarify that Kyacchaa
's struct contains two Sprite
properties, which is somewhat unusual! Most of the other decompiled enemy structures only include a single Sprite
in its structure.
Also, the major part of the matching errors is because the target assembly uses a register while the current assembly users the other register. The code itself is mostly right.
After some research and trial and error, I manually improved the struct and corrected a few offsets, but I still couldn’t resolve the register mismatches. So, I decided to give Claude another shot, this time using a better example: I found another enemy in the SA3 codebase, minimole, which also has two Sprite
properties. I ran Claude using the new prompt and… an anonymous user forked my scratch and achieved a 100% match 😅
The Claude code is mostly even with the one from the as the anonymous user: it creates a single Sprite
variable within the function, which fixed to incorrect register handling. Good catch!
Finding good examples
You might be wondering how I select examples for the prompt. While this process could certainly be automated in the future, my current approach is manual but systematic.
For functions that call external routines, I use a matching strategy: if a function contains bl foo_1234
, I search for other decompiled functions that also call foo_1234
. I then trace back through the git history to find its original assembly and include them as examples in the prompt.
For functions that don’t make external calls, I look for similar functions based on assembly patterns: essentially finding other self-contained functions that have comparable structure and complexity when examining their assembly code.
Decompiling the rest of the functions
I’ll briefly cover how I decompiled the remaining functions, since the approach is largely consistent across all of them.
sub_8065A8C
— Easy win! Fed Claude with the equivalent Minimole function as an example and got a close 95% match. Just two quick line fixes and it was matched. Neat!sub_8065B0
— Started with a function from Minimole as example, but it gave poor results. Switched to a better example and Claude hit 85% match. Fixed some array indices and a struct type, then got the 100% match. Fantastic!sub_8065B90
— This was trickier. I noticed that function callssub_805CD20
, a function commonly used across many enemy. This insight led me to use a Marun's function as an example, which Claude returned a code with 84% match. I pushed it to 91% through manual fixes, but hit a wall with the final 9%. Fortunately, Jace from the Sonic reverse engineering community helped, getting the 100% match, but using agoto
. Then freshollie refining it further to achieve a match without anygoto
.sub_8065C48
— Another tricky function, since it's self-contained and I could not find any similar function by reading the assembly. Then, I had no good examples to fed Claude. Still, it managed to return a code with 58% match. Despite the low score, it did help identify issues on the enemy struct. After manually improving the struct, I asked Claude again to iterate on the new code and it brought me to 61% match. Still far from target, I eventually set this function aside to focus on the others. Once those were complete, I asked on Discord for help. The contributors quickly pinpointed the problem: Claude had over-complicated things with excessive variables. They delivered an 100% match with a simpler code.sub_8065E48
— Claude hit 90% match but with a weird empty block! Easy fix though, since I knew a similar function to let me patch it up to a full match manually.sub_8065EB0
— Claude hit 91% match. It happened because enemy struct was wrong, usings32
instead ofu16
, and some miscalculated offsets. It was easy to fix: theldrh
(target) vsldr
(actual) clearly indicated thes32
/u16
type confusion.The following functions AI returned a code that matched 99–100%: Task_Kyacchaa, sub_8065F5C, sub_8065F30, sub_8065CE0, sub_8065F10, and TaskDestructor_Kyacchaa.
At the end, my prompt template was the following:
You are decompiling an assembly function in ARM7 from a Gameboy Advance game.
# Example
You know that this assembly:
```
{{ assembly code of the example function }}
```
Translates to this C code:
```
{{ C code of the example function }
```
This is the definition for `{{ argument type name used in the example function }}`:
```
{{ typedef for the argument type }}
```
# Task
Given the above context, translate this assembly to an equivalent C code:
```
{{ assembly code that I want to decompile }}
```
You know that this assembly function returns `{{ type that this function returns }}` and receives the following struct as the parameter:
```
{{ argument typedef that this function receive }}
```
# Context
Pay attention to the following definitions:
```
{{ any relevant typedefs }}
```
The key is including all useful type definitions without overloading the prompt with too much information.
Finally, since I was more familiar with the project conventions, as well as the macros and named constants to use, the PR review went much more smoothly than the first one and was quickly merged. It increased in 0.2% the match rate for Sonic Advance 3 🎉
Breakdown
So, how AI performed to decompile this module? Let's see it into a table:
Function | Match Rate | Difficulty to Finish | Note | |
---|---|---|---|---|
CreateEntity_Marun | 99% | easy | ||
sub_806599C | 85% | medium | the second prompt got close to 100% | |
sub_8065A8C | 95% | easy | ||
sub_8065B0 | 85% | easy | ||
sub_8065B90 | 91% | hard | needed help to finish | |
sub_8065C48 | 61% | hard | needed help to finish | |
sub_8065E48 | 90% | easy | ||
sub_8065EB0 | 91% | easy | ||
Task_Kyacchaa | 99-100% | easy | ||
sub_8065F5C | 99-100% | easy | ||
sub_8065F30 | 99-100% | easy | ||
sub_8065CE0 | 99-100% | easy | ||
sub_8065F10 | 99-100% | easy | ||
TaskDestructor_Kyacchaa | 99-100% | easy |
Aside from two functions, the decompilation process went smoothly and was quite straightforward!
Final Thoughts
AI proved incredibly helpful for decompiling this module, at the same time we have room to extend and automate this approach for more complex cases. For instance, it could automatically suggest good examples and necessary context for the prompt… perhaps through a VS Code extension?
By improving the AI workflow to be truly automatic, we could focus our efforts on challenging functions while letting AI handle the straightforward ones independently.
In the next chapter, I’ll walk through the steps to automate this process by developing a VS Code extension!
See you on the next chapter!
Follow me on Twitter and Bluesky to stay in the loop. You can also watch me coding on Twitch. I’m active on the decomp.me Discord server as well, my username is trickster.42
.