FM-index

The idea of the FM-index is magical and kind of boggling when you first learn about it. Essentially it’s a text file which is stored in a compressed form which also an index for substring search. You can also do regular expression searches, but the technique isn’t very well documented. (It’s possible, in fact, to do all the same kinds of searches that you can do on a suffix array or suffix tree.)

FM-indexes can also be used to store collections of multiple texts which can grow and shrink over time, though the technique isn’t very well documented and has only a few implementations.

Kragen hypothesizes that FM-indexes are only really fast when they’re stored on SSDs, but also says ‘I don’t know if this is really true.’ It seems likely, since they require a lot of random accesses.

These lecture notes are the easiest-to-follow introduction to the FM-index which I’ve found.

The homepage of the DFMI implementation has many useful links, too.