Exploring Animal Crossing's Bugged Trigram Check
Introduction
The original Animal Crossing for the GameCube allows you to mail letters to your resident villagers. To handle villager responses to these letters, the developers implemented several checks that internally score the contents of any letter you send to a villager. This internal score affects how the villager responds to the letter, as well as your friendship with them and whether or not they will give you gift. Full details on this letter scoring system can be found within my letter scoring tool.
The Trigram Check
The focus of this post is specifically on one of the seven checks that the game performs to score your letter, which does not work as intended. Internally called “check B,” the game uses a trigram-based system to try and ensure you are writing real English words. This works by checking the first three characters of all words in your letter and cross-referencing them with a pre-made lookup table. The game has 26 lookup tables for each letter of the alphabet, all of which are arrays containing pairs of characters that are considered valid to follow the key letter. For example, the trigram table for the letter “A” looks like this:
1
2
3
4
5
6
7
8
static u8 str_a_table[] = {
CHAR_b, CHAR_l,
CHAR_b, CHAR_o,
CHAR_b, CHAR_r,
CHAR_b, CHAR_s,
...
0, 0
};
So, if any word in your letter begins with the letter ‘A’, this lookup table is referenced to check the next two characters that follow the letter ‘A’. This way, words like “able”, “above”, and “abrupt” are all considered valid because they begin with { a, b, l }
, { a, b, o }
, and { a , b , r }
, respectively. However, following this system, words like “abbey” would not be considered valid since {b , b}
does not appear in the trigram lookup table for the letter ‘A’. Again, there are tables like this for each letter of the alphabet, and they collectively define all of the trigrams that the game considers valid. In terms of scoring, this check counts all of the valid trigrams it finds in your letter, multiplies it by 3, and adds it to your letter’s total score.
Checking the beginning of words for common trigrams like this isn’t a bad idea and I think it was actually a pretty clever way to favorably score letters that contained real words. However, again, the issue is that this check does not work as intended due to a large oversight.
The Bug
Whenever the game references a trigram table, it loops through each pair of characters looking for a match. Naturally, the game is supposed to stop checking when it reaches the end of the table, but this does not happen in the North American / Australian versions of the game. Instead, the game will continue looking for pairs of matching characters outside of the array boundaries in RAM. This bug occurs because the game will only stop looking for pairs of characters if it comes across a special “stop character” at an even index. This stop character is internally hexadecimal 0x7F
, and every table is supposed to be appended with a pair of these stop characters. However, instead, every table is appended with 0x00
, which does not instruct the game to stop checking for trigrams. Thus, whenever any table is referenced for a lookup and there is not a matching pair of characters found, the game will continue looking into RAM until it comes across a random 0x7F
byte on an even index that instructs the game to stop. Note that this 0x7F
byte must occur on at an offset in RAM ending in 0x0
, 0x2
, 0x4
, 0x6
, 0x8
, 0xA
, 0xC
, or 0xE
because the game increments the searching index by +2 each loop, which allows for pairs of characters to be detected. Thus, any stop value found on odd-ending offsets will be skipped over.
The Effect
The trigram tables are all set up back-to-back and alphabetically in RAM, starting at offset 0x8069F320
. Due to the fact that indexing does not stop at the end of each table, each letter’s trigram table effectively adopts the valid characters of all letters that come after it alphabetically. For example, the trigram table for the ‘B’ looks like this:
1
2
3
4
5
6
7
8
static u8 str_b_table[] = {
CHAR_a, CHAR_b,
CHAR_a, CHAR_c,
CHAR_a, CHAR_d,
CHAR_a, CHAR_g,
...
0, 0
};
However, because this trigram table is laid out after trigram table ‘A’ in RAM, all of these pairings of characters are now valid for the letter ‘A’ as well. This means nonsense words like “Aablsdasda” count as valid since the two characters after ‘A’ are { a , b }
and that pairing is considered valid for trigram table ‘B’, which letter ‘A’ adopts due to the bug. From here, you can easily surmise the effect of this bug, where any word beginning with the letter ‘A’ effectively adopts all the valid character pairings for trigram tables ‘B’ through ‘Z’. This continues down the chain, where any word beginning with the letter ‘B’ also adopts the valid character pairings of trigram tables ‘C’ through ‘Z’, and so on.
At the end of trigram table ‘Z’, the game will read past all the trigram tables and continue checking for valid character pairings in RAM, reading the byte values as characters. Again, this continues until the game comes across a byte with a value of 0x7F
at an even offset. Luckily, the RAM here is constructed consistently and the first instance of a 0x7F
byte at an even offset after the trigram tables always occurs at offset 0x806A102E
. This allows the game to return from its trigram checks and prevents crashing. Still, this stop character is a whopping 5,858 bytes after the end of the ‘Z’ trigram table. This means there’s essentially an additional 2929 pairs of characters are appended as a valid table for all the other trigram tables. A lot of these characters repeated and cannot even be written into the game using the game’s UI keyboard, making them superfluous and a bit of a resource waste to check. Still, there are some valid character pairings here, such as { !, }
which would count something like “A! “ as valid.
For my letter scoring tool, I took the liberty of encoding all 5,858 extra bytes to their proper character values as they would be read by Animal Crossing while removing invalid and duplicate entries. So, even though the game only intended for there to be 776 valid trigrams, there are actually 23,670 valid trigrams that can be typed in reality.
Tables
Table | Intended Trigrams vs. Bugged (Effective) Trigrams |
---|---|
A: | 57 → 1000 |
B: | 49 → 973 |
C: | 44 → 968 |
D: | 49 → 968 |
E: | 28 → 966 |
F: | 39 → 955 |
G: | 26 → 953 |
H: | 36 → 953 |
I: | 19 → 952 |
J: | 9 → 944 |
K: | 10 → 944 |
L: | 40 → 943 |
M: | 38 → 940 |
N: | 24 → 938 |
O: | 25 → 935 |
P: | 38 → 918 |
Q: | 3 → 913 |
R: | 34 → 911 |
S: | 87 → 904 |
T: | 51 → 857 |
U: | 11 → 835 |
V: | 10 → 825 |
W: | 40 → 820 |
X: | 1 → 788 |
Y: | 7 → 787 |
Z: | 1 → 780 |
Total: | 776 → 23670 |