danyolgiax
danyolgiax

Reputation: 13086

TextSearch engine in MongoDb

I'm trying to create a search feature in my application using c# Driver for MongoDb.

When user search for a list of words I want to display first the exact match (if it exists) followed by the most interesting posts.

I've a text-index like this:

collection.CreateIndex(IndexKeys<Post>.Text(p => p.BodyPlainText));

I started this way:

var textSearchQueryExact = Query.Matches("BodyPlainText", searchString);
var textSearchQueryFullText = Query.Text(searchString);
var textSearchQuery = Query.Or(textSearchQueryFullText, textSearchQueryExact);

This generates the following query:

{  "$or" : [ { "$text" : { "$search" : "My example text" } }, { "BodyPlainText" : /My example text/ }] }

But it doesn't works! Zero results and, in MongoVUE, I recieve no explain. If I remove one of the two query filters it works, while with both doesn't.

I also find this restriction in Mongo documentation:

"To use a $text query in an $or expression, all clauses in the $or array must be indexed."

but the propery is the same and it is indexed.

What I miss?

Is this the right way to achieve the result I want?

Update

Using only textSearch query with this syntax:

{ "$text" : { "$search" : "il ricorrente lamentava che mentre" } }

give me the wrong result due to stem and stop word elaboration (I think):

enter image description here

Upvotes: 2

Views: 2538

Answers (2)

Valentyn Kahamlyk
Valentyn Kahamlyk

Reputation: 116

try set default language in index

db.CollectionName.ensureIndex(
   { BodyPlainText: "text" },
   { default_language: "it" }
)

and in query

{ $text: { $search: "il ricorrente lamentava che mentre", $language: "it" } }

and stemming will help you, not disturb

Upvotes: 1

Neil Lunn
Neil Lunn

Reputation: 151112

Well you can do this pretty much exactly as the error message produced says that you should be processing this. This is actually part of how the "index intersection" works in MongoDB 2.6 and greater. It was "sort of" there for $or influenced queries before, but there is a lot more "under the hood" now. So basically add an index like the error asks you to:

Considering the data:

db.example.insert({ "text": "This is what I want" })

Then add the indexes:

db.example.ensureIndex({ "text": "text" })
db.example.ensureIndex({ "text": 1 })

So then the query works as you would expect:

db.example.find(
    { 
        "$or": [ 
            { "$text": { "$search": "This is what I want" } }, 
            { "text": /This is what I want/  }
        ]
    },
    { "score": { "$meta": "textScore" } }
).pretty()

Noting that I actually did add the $meta in there even if the .sort() was omitted. But pretty much where I was going from comments is that this is actually the same:

db.example.find(
    { 
        "$or": [
            { "$text": { "$search": "This is what I want" } }
        ]
    },
    { "score": { "$meta": "textScore" } }
).pretty()

So while there is an effort to "intersect" in the first example, the document score for the exact match will remain the same:

{
    "_id" : ObjectId("53870b75015cb64be54d7ecf"),
    "text" : "This is what I want",
    "score" : 1
}

For more of a "stemming" example as you mentioned then there is this to consider:

db.example.insert({ "text": "these colors are mine" })
db.example.insert({ "text": "This color are mine" })

And both of the query forms:

db.example.find(
    { 
        "$or": [
            { "$text": { "$search": "This color are mine" } },
            { "text": /This color are mine/  }
        ]
    },
    { "score": { "$meta": "textScore" }}
).pretty()

db.example.find(
    { 
        "$or": [
            { "$text": { "$search": "This color are mine" } }
         ]
    },
    { "score": { "$meta": "textScore" }}
).pretty()

Superfluous use of $or there in all cases, but It's quick to copy and paste. But again the report is the same value being returned:

{
    "_id" : ObjectId("53870f5a015cb64be54d7ed0"),
    "text" : "these colors are mine",
    "score" : 1.5
},
{
    "_id" : ObjectId("5387114b015cb64be54d7ed1"),
    "text" : "This color are mine",
    "score" : 1.5
}

So that is pretty much how the rankings sort out when doing a query with that form.

Upvotes: 2

Related Questions