Nabor
Nabor

Reputation: 1701

MongoDB find slow with regex

The system I currently develop on, uses MongoDB 2.4.4

I have a collection of users.

There is a combined index: { "LASTNAME" : 1 , "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1} I also tried with single indexes, without a performance increase

The system contains 400.000 test records.

The query (Java debug from org.springframework.data.mongodb.core.query.Query):

{ "LASTNAME" : { "$regex" : "^Schm"}},
  Fields: { "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1 ,"LASTNAME" : 1},
  Sort: { "LASTNAME" : 1 , "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1}

performs within 16ms.That is fantastic.

This query is not shown in the MongoDB console (no debug information to post here).

But, I like to search not only starts with, it also should be case insensitive.

The query:

{ "LASTNAME" : { "$regex" : "^Schm" , "$options" : "i"}},
  Fields: { "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1 , "LASTNAME" : 1},
  Sort: { "LASTNAME" : 1 , "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1}

performs within 897ms. That is unacceptable slow.

Mongo Console shows this:

query: { query: { LASTNAME: /^Schm/i },
  orderby: { LASTNAME: 1, FIRSTNAME: 1, EMAIL: 1, CITY:1, STATUS: 1 }
} cursorid:1252405545564528 ntoreturn:25 ntoskip:0 nscanned:297651
keyUpdates:0 numYields: 1 locks(micros) r:1391715 nreturned:25 reslen:4422 897ms

As one can see. It's not the scanAndOrder problem that points onto index problems.

Then I tried to solve it the next way what fits the most scenarios (inserted from the user, lower case and upper case), but that is slower also. My expectation was, that it performs thrice as long as the first query.

The query:

{ "$or" : [ { "LASTNAME" : { "$regex" : "^Schm"}} , { "LASTNAME" : { "$regex" : "^schm"}} , { "LASTNAME" : { "$regex" : "^SCHM"}}]},
  Fields: { "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1 , "LASTNAME" : 1},
  Sort: { "LASTNAME" : 1 , "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1}

performs within 1300ms. Nothing more to say.

MongoDB console:

query: { query: { $or: [ { LASTNAME: /^Schm/ }, { LASTNAME: /^schm/ }, { LASTNAME: /^SCHM/ } ] },
  orderby: { LASTNAME: 1, FIRSTNAME: 1, EMAIL: 1, CITY: 1, STATUS: 1 }
} cursorid:43560166842085 ntoreturn:25 ntoskip:0 nscanned:297651
keyUpdates:0 numYields: 1 locks(micros) r:1531168 nreturned:25 reslen:4422 1300ms

So, how can I search case insensitive that nearly has the speed of the first search? Maximal 150ms!

Upvotes: 0

Views: 9376

Answers (1)

Colin M
Colin M

Reputation: 13348

As soon as you add case insensitivity, you can no longer use an index. This is an important design concern when building an application that needs to have support for searching.

To overcome this, you should store an already-lowercased version of the last name in another field, and query against that case-sensitively (obviously translating all of your search queries to lowercase before passing them to Mongo).

Edit

It looks like text search has been added in 2.4. Read about it here and see if that will do what you need.

As a side note, if you are really concerned with performance (which, judging by your question, it sounds like you might be) you should really reconsider searching against your data storage engine. Consider an alternative search engine such as ElasticSearch (or a simple Lucene index) to keep search traffic off of your primary data store.

Upvotes: 11

Related Questions