Julio Endara
Julio Endara

Reputation: 65

MongoDB: How to do a text search and sort by a date

Context: I have a MongoDB populated with large number of emails. I'd like to do a search for all emails that include a given email address within any of the following fields: To, From, CC and BCC. The result needs to be sorted by the field Date. We're currently trying the following query:

db.collection.find({ $text : {$search: "\"[email protected]\""}}).sort({Date:1})

I've tried doing a compound index including the date but it does not work.

With this index...

db.collection.createIndex({Date: 1, From:"text", To:"text", CC:"text", BCC:"text"})

it gives error 17007 as Date should have an equality match as it's a prefix. This is not an option as we'd like all emails regardless of the date.

Also with this other index...

db.collection.createIndex({From:"text", To:"text", CC:"text", BCC:"text", Date:1})

Then it gives error 17144 as it goes over the internal limit for the sort.

We've read the following:

Stackoverflow ref

Stackoverflow ref

mongoDB doc on compound index

In these references and others I'm getting the idea that this is not possible but I don't think what we're trying to do is atypical or so much out of the box.

Are we doing something wrong? Is there a way to do this query with compound index or any other MongoDB feature?

thanks!

Upvotes: 3

Views: 4445

Answers (1)

Blakes Seven
Blakes Seven

Reputation: 50406

Regardless of other compound index keys, you need to include the $meta for the "textScore" in order to get the correct sorting:

db.collection.find(
    { "$text": { "$search": "\"[email protected]\""}},
    { "score": { "$meta": "textScore" } }
).sort({
    "score": { "$meta": "textScore" }, "Date": 1
})

So naturally you want that "score" to sort first, and then by "Date" in order for things to be correctly ranked by relevance of the search.

The order of index does not matter, but of course you can ony have "one" text index. So make sure you drop all others before creating:

db.collection.createIndex({ 
   "From": "text",
   "To": "text",
   "CC":"text", 
   "BCC": "text", 
   "Date":1
})

Look for indexes that are current with:

db.collection.getIndicies()

Or just drop everything and start fresh:

db.collection.dropIndexes()

For the data you appear to be searching on though, I would have thought a regular compound index on each field should suit you better. Looking for "email" addresses should be an "exact match", and if you expect multiple items for each field then they should be arrays of strings, like so:

{
    "TO": ["[email protected]"],
    "FROM": ["[email protected]"],
    "CC": ["[email protected]","[email protected]"],
    "BCC": [],
    "Date": ISODate("2015-07-27T13:42:05.535Z")
}

Then you need seperate indexes on each field, possibly in compound with "Date" like so:

db.email.createIndex({ "TO": 1, "Date": 1 })
db.email.createIndex({ "FROM": 1, "Date": 1 })
db.email.createIndex({ "CC": 1, "Date": 1 })
db.email.createIndex({ "BCC": 1, "Date": 1 })

And query with an $or condition:

db.email.find({
    "$or": [
        { "TO": "[email protected]" },
        { "FROM": "[email protected]" },
        { "CC": "[email protected]" },
        { "BCC": "[email protected]" }
    ],
    "Date": { "$lt": new Date() }
})

If you look at the .explain(true) (verbose) output from that, you should see that the winning plan is an "index intersection" of all the specified indexes. This works out to be very efficient as every field ( and index selected ) has an exact match value, and a range match on the indexed date.

That's going to be a lot better for you than the "fuzzy matching" of text searches. Even regular expressions should work better here in general ( for e-mail addresses ) and especially if they are "anchored" ^ to the start of the string.

Text indexes are meant for "word like tokens" to match, but this should not be your data. The $or does not look at nice, but it should do a much better job.

Upvotes: 6

Related Questions