MongoDB text index search

Question

I had created a collection in mongo db as show below

db.articles.insert([
 { _id: 1, subject: "one", author: "abc", views: 50 },
 { _id: 2, subject: "lastone", author: "abc", views: 5 },
 { _id: 3, subject: "firstone", author: "abc", views: 90  },
 { _id: 4, subject: "everyone", author: "abc", views: 100 },
 { _id: 5, subject: "allone", author: "efg", views: 100 },
 { _id: 6, subject: "noone", author: "efg", views: 100 },
 { _id: 7, subject: "nothing", author: "abc", views: 100 }])

after that I given text indexing to the field subject and author.

db.articles.createIndex(
    {subject: "text",
    author: "text"})

Now I am trying to search a word with "one" in indexed field. When I execute query ...

db.articles.count({$text: {$search: "\"one\""}})

... the result is 1.

The problem is that when I want combination of words "one", "abc" ...

db.articles.count({$text: {$search: "\"one\" \"abc\""}}

... it gives the result as 4. Including the records that contains the subject name as "lastone", "firstone", "everyone", "one" as the result.

So my question is that why the first query dosn't fetch 4 records? And how can I write a query that can fetch 4 records with word "one"?

glytching · Accepted Answer

This command ...

db.articles.count({$text: {$search: "\"one\""}})

... will count the documents having the exact phrase "one". There is only one such document, hence the result is 1.

Querying with the vaule "one" should only return on document since there is only one document containing either "one" or some value for which "one" is a stem. From the docs:

For case insensitive and diacritic insensitive text searches, the $text operator matches on the complete stemmed word. So if a document field contains the word blueberry, a search on the term blue will not match. However, blueberry or blueberries will match.

Looking at the documents in your question ...

one is not a stem of everyone
one is not a stem of lastone
one is not a stem of allone
one is not a stem of firstone
one is not a stem of noone

... so none of these documents will be matched for the value one.

You can, of course, query with multiple values. For example:

The docs suggest that this should be evaulated as one or abc and it correctly returns 5:
```
db.articles.count({$text: {$search: "one abc"}})
```
The docs suggest that this should be evaulated as "abc" AND ("abc" or "one") and it correctly returns 5:
```
db.articles.count({$text: {$search: "\"abc\" one"}})
```
The docs suggest that this should be evaulated as "one" AND ("one" or "abc") but it somehow returns 4:
```
db.articles.count({$text: {$search: "\"one\" abc"}})
```

In the last example MongoDB includes the documents with subject in "one", "lastone", "firstone", "everyone" but excludes the document with subject "nothing". This suggest that it has somehow deemed "one" to be a stem of "lastone", "firstone" and "everyone" but when executing count({$text: {$search: "one"}}) it returns 1 which clearly indicates that one is not seen as a stem of "lastone", "firstone" and "everyone".

I suspect this might be a bug and might be worth raising with MongoDB.

FWIW, it's possible that what you actually want is a partial string search in which case $regex might work. The following query ...

db.articles.count({ subject: { $regex: /one$/ }, author: { $regex: /abc$/ } })

... means something like count where subject like '%one%' and author like '%abc%' and for your documents that returns 4 i.e. the documents where subject is one of "one", "lastone", "firstone", "allone", "everyone", "noone" and author is "abc".

MongoDB text index search

Answers (1)

Related Questions