Reputation: 4108
I'm a new user of mongo (with extensive experience with mysql) and it seems to have some interesting pros and cons. The pros are, apparently, the size of the data you can hold and the speed at which you can write records. I have an application where I'm writing many logs to a collection and I have about 7m so far. My problem is that a seemingly simple query takes a really REALLY long time. Let me explain.
My collection has 7m documents:
> db.alpha2.count()
7257619
Now I want to count all records for a given cid and that has a timestamp less than some number (this example has the timestamp in the future, so it should count everything):
> db.alpha2.find({'ts': {'$lt': 1446457607}, 'cid': '2636518'}).count()
7257619
This is the problem query, it takes a full 58 seconds to return this number to me! Conceptually, this is a very simple query and would be somewhat equivalent to this in sql world:
select count(*) from alpha2 where cid=2636518 and ts<1446457607
I don't have an equivalent table, but I think that would take less than 0.1 seconds to run in mysql based on my experience. So what do I do? I plan on doing a lot of aggregation counting on datasets much bigger than 7m records. I'm also doing some slightly harder stuff (map reduce) and that's much much worse (a few minutes). I need this to be less than a second. What am I doing wrong? Is this time cost expected with mongo?
I put an index on the ts value with this prior to my timing on the queries above:
db.alpha2.ensureIndex({ts:1})
Upvotes: 0
Views: 732
Reputation: 65403
In order to determine the count()
, MongoDB will have to find all matching documents.
You can explain() the query to see how the indexes are used:
db.alpha2.find({'ts': {'$lt': 1446457607}, 'cid': '2636518'}).explain()
In particular you want to minimize the nscannedObjects
(number of documents scanned).
Your best case standard scenario is to index on all fields involved in the count (and ensure the index fits in available RAM).
So, your index should include cid
as well:
db.alpha2.ensureIndex({ts:1, cid:1})
If you are doing frequent counts, you may be better storing and updating these via a process like an incremental map/reduce if that works for your use case.
Upvotes: 4