Help:Strings
A "string" in computer science is just a list of characters (letters, digits, and typographic symbols) that are used in a program. For example, every student's first program is some variant of the "Hello World" example, which simply prints "Hello, world!" to the screen and exits. "Hello, world!" is a string. In this case, the data for this string would be contained in the program listing, but strings can also be loaded externally, such as from a game's data files. They can also be found inside "metadata" that describe certain files; for example, your iPod knows the artist and album that a song comes from because specially formatted strings inside the file (usually called "tags") provide that information.
That's a brief discussion about strings. But this article is really about strings.
GNU strings is a command-line utility that can be used to find text data in ROMs. All it does is look for contiguous NUL-terminated strings of printable characters, and lists all of the strings it finds.
It is of limited use because it will not work if the data is compressed, encrypted, or stored using a character table rather than a standard encoding. However, in some games, this isn't an issue; some games, like Phantasy Star, use plain ASCII text for all of the text in the game, while others, like Pachi Com, have hidden ASCII strings.
If you use Linux, your system probably already has strings installed (try strings -v). If not, install the package "binutils" on whatever packaging system your distribution uses.
For Windows, there are a few options:
- This version of Strings by Sysinternals supports searching for Unicode as well as ASCII.
- The MinGW compiler collection includes a Windows version of strings as part of the GNU binutils package.
- You can use Cygwin. Again, install the "binutils" package from the list of packages that's shown when you run setup.
Once the program is installed, just run strings file (where "file" is the name of the file you want to search, obviously). It prints to stdout, so if you want to save the results to a file, redirect the output with strings file > out.txt or something similar. (If you're using Cygwin on Windows, you'll have to run this in Cygwin's terminal, not the normal Windows command line.)
Most of the output of strings will be random segments of code, graphics, etc. that happen to have a large number of printable characters in a row. It won't list a string unless it has 4 or more characters (configurable with the -n switch), but remember that roughly 37.1% of the ASCII table is printable, so the odds of four random characters happening to all be printable are not really that bad.