Reputation: 1701
The system I currently develop on, uses MongoDB 2.4.4
I have a collection of users.
There is a combined index: { "LASTNAME" : 1 , "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1}
I also tried with single indexes, without a performance increase
The system contains 400.000 test records.
The query (Java debug from org.springframework.data.mongodb.core.query.Query):
{ "LASTNAME" : { "$regex" : "^Schm"}},
Fields: { "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1 ,"LASTNAME" : 1},
Sort: { "LASTNAME" : 1 , "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1}
performs within 16ms.That is fantastic.
This query is not shown in the MongoDB console (no debug information to post here).
But, I like to search not only starts with, it also should be case insensitive.
The query:
{ "LASTNAME" : { "$regex" : "^Schm" , "$options" : "i"}},
Fields: { "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1 , "LASTNAME" : 1},
Sort: { "LASTNAME" : 1 , "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1}
performs within 897ms. That is unacceptable slow.
Mongo Console shows this:
query: { query: { LASTNAME: /^Schm/i },
orderby: { LASTNAME: 1, FIRSTNAME: 1, EMAIL: 1, CITY:1, STATUS: 1 }
} cursorid:1252405545564528 ntoreturn:25 ntoskip:0 nscanned:297651
keyUpdates:0 numYields: 1 locks(micros) r:1391715 nreturned:25 reslen:4422 897ms
As one can see. It's not the scanAndOrder problem that points onto index problems.
Then I tried to solve it the next way what fits the most scenarios (inserted from the user, lower case and upper case), but that is slower also. My expectation was, that it performs thrice as long as the first query.
The query:
{ "$or" : [ { "LASTNAME" : { "$regex" : "^Schm"}} , { "LASTNAME" : { "$regex" : "^schm"}} , { "LASTNAME" : { "$regex" : "^SCHM"}}]},
Fields: { "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1 , "LASTNAME" : 1},
Sort: { "LASTNAME" : 1 , "FIRSTNAME" : 1 , "EMAIL" : 1 , "CITY" : 1 , "STATUS" : 1}
performs within 1300ms. Nothing more to say.
MongoDB console:
query: { query: { $or: [ { LASTNAME: /^Schm/ }, { LASTNAME: /^schm/ }, { LASTNAME: /^SCHM/ } ] },
orderby: { LASTNAME: 1, FIRSTNAME: 1, EMAIL: 1, CITY: 1, STATUS: 1 }
} cursorid:43560166842085 ntoreturn:25 ntoskip:0 nscanned:297651
keyUpdates:0 numYields: 1 locks(micros) r:1531168 nreturned:25 reslen:4422 1300ms
So, how can I search case insensitive that nearly has the speed of the first search? Maximal 150ms!
Upvotes: 0
Views: 9376
Reputation: 13348
As soon as you add case insensitivity, you can no longer use an index. This is an important design concern when building an application that needs to have support for searching.
To overcome this, you should store an already-lowercased version of the last name in another field, and query against that case-sensitively (obviously translating all of your search queries to lowercase before passing them to Mongo).
Edit
It looks like text search has been added in 2.4. Read about it here and see if that will do what you need.
As a side note, if you are really concerned with performance (which, judging by your question, it sounds like you might be) you should really reconsider searching against your data storage engine. Consider an alternative search engine such as ElasticSearch (or a simple Lucene index) to keep search traffic off of your primary data store.
Upvotes: 11