Reputation: 1053
My collection is described as follows:
{ "_id" : ObjectId("5474af69d4b28042fb63b856"), "name" : "XXXX", "action" : "accept", "source" : "127.0.0.1", "srcport" : "80", "destination" : "192.168.0.13", "dstport" : "53213", "service" : "443", "service_id" : "https", "unixtime" : NumberLong("1412774569000"), "segment" : "MySegment", "direction" : "INCOMING", "location" : "US" }
I currently have ~5.5mio entries in my collection and the base query always is:
collection.count({"action":"2_different_action_types", "direction":"3_different_directions", "unixtime": {"$gte": 1412774000000, "$lte": 1412774900000}})
Action, direction and unixtime always exist in my query but their value is dynamic. Optional (also dynamic values) parameters are:
For example:
collection.count({"action":"2_different_action_types", "direction":"3_different_directions", "location":"US","segment":"mySegment", "unixtime": {"$gte": 1412774000000, "$lte": 1412774900000}})
collection.count({"action":"2_different_action_types", "direction":"3_different_directions", "service_id":"https", "unixtime": {"$gte": 1412774000000, "$lte": 1412774500000}})
I created the following indexes:
db.collection.createIndex( {unixtime: 1, action: 1, direction: 1 })
db.collection.createIndex( {unixtime: 1, action: 1, direction: 1 , location:1})
db.collection.createIndex( {unixtime: 1, action: 1, direction: 1 , service_id:1})
db.collection.createIndex( {unixtime: 1, action: 1, direction: 1 , segment:1})
db.collection.createIndex( {unixtime: 1, action: 1, direction: 1 , location:1, service_id: 1})
db.collection.createIndex( {unixtime: 1, action: 1, direction: 1 , location:1, segment: 1})
My query without the index took ~8sec, the query with an index ~6sec, which is still kinda slow.
How can I speed up the whole thing? Note, that at the moment I'm just counting the findings, not really looking for a specific entry.
Additional Info:
I'm currently trying to optimize those queries directly in the mongoshell but in the end, I'm querying via NodeJS (don't know if this is relevant for the solution).
Upvotes: 3
Views: 3843
Reputation: 46301
The indexes don't seem to make much sense this way. Not-equals-queries like $gte
and $lte
should be at the end - not only in the query, but also in the index. Putting unixtime at position 1 in the index is generally a bad idea (unless you need the set of distinct actions within a single second and the number of actions in a single second is so large that they need indexing, which is unlikely).
Try to reverse the indexes and make sure the order of the index matches the order in the query.
If location
, segment
and service_id
have low selectivity, try without an index on these fields first. More indexes cost more RAM and slow insertion and update time, but with low selectivity, the gain in queries is sometimes negligible. In the query, it might make sense to put the optional fields last, at the end of all the other operations - if the candidate set is small enough after the required criteria and the unixtime
interval, a collection scan of the remaining items shouldn't hurt performance too badly. If they do and the selectivity is high, move them further forward.
Upvotes: 2