Sushant K
Sushant K

Reputation: 179

How does MongoDB $text search works?

I have inserted following values in my events collection

db.events.insert(
   [
     { _id: 1, name: "Amusement Ride", description: "Fun" },
     { _id: 2, name: "Walk in Mangroves", description: "Adventure" },
     { _id: 3, name: "Walking in Cypress", description: "Adventure" },
     { _id: 4, name: "Trek at Tikona", description: "Adventure" },
     { _id: 5, name: "Trekking at Tikona", description: "Adventure" }
   ]
)

I've also created a index in a following way:

db.events.createIndex( { name: "text" } )

Now when I execute the following query (Search - Walk):

db.events.find({
    '$text': {
        '$search': 'Walk'
    },
})

I get these results:

{ _id: 2, name: "Walk in Mangroves", description: "Adventure" },
{ _id: 3, name: "Walking in Cypress", description: "Adventure" }

But when I search Trek:

db.events.find({
    '$text': {
        '$search': 'Trek'
    },
})

I get only one result:

{ _id: 4, name: "Trek at Tikona", description: "Adventure" }

So my question is why it dint resulted:

{ _id: 4, name: "Trek at Tikona", description: "Adventure" },
{ _id: 5, name: "Trekking at Tikona", description: "Adventure" }

When I searched walk it resulted the documents containing both walk and walking. But when I searched for Trek it only resulted the document including trek where it should have resulted both trek and trekking

Upvotes: 8

Views: 3504

Answers (1)

Stennie
Stennie

Reputation: 65353

MongoDB text search uses the Snowball stemming library to reduce words to an expected root form (or stem) based on common language rules. Algorithmic stemming provides a quick reduction, but languages have exceptions (such as irregular or contradicting verb conjugation patterns) that can affect accuracy. The Snowball introduction includes a good overview of some of the limitations of algorithmic stemming.

Your example of walking stems to walk and matches as expected.

However, your example of trekking stems to trekk so does not match your search keyword of trek.

You can confirm this by explaining your query and reviewing the parsedTextQuery information which shows the stemmed search terms used:

db.events.find({$text: {$search: 'Trekking'} }).explain().queryPlanner.winningPlan.parsedTextQuery
{
​   "terms" : [
​       "trekk"
​   ],
​   "negatedTerms" : [ ],
​   "phrases" : [ ],
​   "negatedPhrases" : [ ]
}

You can also check expected Snowball stemming using the online Snowball Demo or by finding a Snowball library for your preferred programming language.

To work around exceptions that might commonly affect your use case, you could consider adding another field to your text index with keywords to influence the search results. For this example, you would add trek as a keyword so that the event described as trekking also matches in your search results.

There are other approaches for more accurate inflection which are generally referred to as lemmatization. Lemmatization algorithms are more complex and start heading into the domain of natural language processing. There are many open source (and commercial) toolkits that you may be able to leverage if you want to implement more advanced text search in your application, but these are outside the current scope of the MongoDB text search feature.

Upvotes: 11

Related Questions