Reputation: 152
I'm curious about the data structure that MongoDB uses to store the full text search index. I couldn't find any description of this in the documentation.
Reading the initial commit, it seems like it used to use a B-tree:
/*
* GO: sets the tree cursors on each term in terms, processes the terms by advancing
* the terms cursors and storing the partial
* results and lastly calculates the top results
* @param results, the priority queue containing the top results
* @param limit, number of results in the priority queue
*/
void FTSSearch::go(Results* results, unsigned limit ) {
vector< shared_ptr<BtreeCursor> > cursors;
for ( unsigned i = 0; i < _query.getTerms().size(); i++ ) {
const string& term = _query.getTerms()[i];
BSONObj min = FTSIndexFormat::getIndexKey( MAX_WEIGHT, term, _indexPrefix );
BSONObj max = FTSIndexFormat::getIndexKey( 0, term, _indexPrefix );
shared_ptr<BtreeCursor> c( BtreeCursor::make( _ns, _id, min, max, true, -1 ) );
cursors.push_back( c );
}
But fts_search.cpp doesn't exist in the current version, and I couldn't find any reference to the data structure in the current implementation.
Is it still a B-tree? (Was it ever?) Is it a trie? Is it something different?
Upvotes: 1
Views: 393
Reputation: 18835
Based on the current latest tag r3.3.9 of fts_index_format.cpp it is still a BTree.
Also worth mentioning, text index tokenizes and stems the terms in the indexed fields for the index entries. text index stores one index entry for each unique stemmed term in each indexed field for each document in the collection.
There are multiple versions of text index, in MongoDB v3.2 text index version 3 is the default version for new text
indexes. See also Text Indexes
Upvotes: 2