naughty boy
naughty boy

Reputation: 2149

MongoDB : text index with arrays, only first term is indexed

I have a document that has the following Schema

{
  description : String,
  tags : [String]
}

I have indexed both fields as text, but the problem is that whenever I search for a specific string within the array, it will return the document only if the string is the first element of the array. Therefore it seems that the $text index only works for the first element, is this how mongo inherently works or is there an option that must be passed to the index?

Example document

{
   description : 'random description',
   tags : ["hello", "there"]
}

The object that created the index

{description : 'text', tags : 'text'}

The query

db.myCollection.find({$text : {$search : 'hello'}});

returns a document but

db.myCollection.find({$text : {$search : 'there'}});

does not return anything.

using version 2.6.11

I have other indexes but these are the only text indexes. Here is the corresponding output of db.myCollection.getIndexes()

{
                "v" : 1,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "name" : "description_text_tags_text",
                "ns" : "myDB.myCollection",
                "weights" : {
                        "description" : 1,
                        "tags" : 1
                },
                "default_language" : "english",
                "language_override" : "language",
                "textIndexVersion" : 2
        },

Upvotes: 5

Views: 1041

Answers (1)

Nipun Talukdar
Nipun Talukdar

Reputation: 5387

This has nothing to do with the string being first element or second element of the array. The word "there" is in the stop-words list of "english" language and is not added to the index at all. The text indexing process involves stemming and removal of the stop words from the text, before the terms gets added to the text index and these processes are language dependent.

You may like to create the text index as:

db.myCollection.ensureIndex({description : 'text', tags : 'text'}, { default_language: "none" }) 

If "none" is used as the default language, then text indexing process will do simple tokenization and will not use any stop words list. By default, "english" is used as the "default_language" for the text index.

Upvotes: 3

Related Questions