Efficient Approach for Fuzzy String Searching with MongoDB

Question

I want to be able to return a list of the closest matching names from my mongo database given a string. I want to do this as efficiently as possible. To illustrate, my documents look like:

const personSchema = new Schema({
    name: { type: String, required: true }
    ...more fields
});

And the input might look like: Barcck Obmaa which I would expect to return a list of people with the person who's name is Barack Obama at the top.

The algorithm should account for the fact that a string that is a prefix of a name matches that name better than a string that is otherwise equally close to the name but not a prefix.

There are a bunch of algorithms that make use of a precomputed index to make this kind of search faster. Two that have caught my eye are the Pass-Join Index and the BKTree which use algorithms like Levenshtein or Jaro-Winkler. It seems to me that there should be some way to integrate these techniques into a mongo database, but there doesn't seem to be any established way of doing so.

The best solution I could find is an n-gram based approach described in this article. Is this the best option I have?

Efficient Approach for Fuzzy String Searching with MongoDB

Answers (1)

Related Questions