MongoDB object field and range query index

Question

I have the following structure in the database:

{
    "_id" : {
       "user" : 14197,
       "date" : ISODate("2014-10-24T00:00:00.000Z")
    },
...
}

I have a performance problem when I try to select data by user & date-range. Monogo doesn't use index & runs full-scan over collection.

db.timeuse.daily.find({ "_id.user": 289006, "_id.date" : {$gt: ISODate("2014-10-23T00:00:00Z"), $lte: ISODate("2014-10-30T00:00:00Z")}}).explain()
{
    "cursor" : "BasicCursor",
    "isMultiKey" : false,
    "n" : 6,
    "nscannedObjects" : 66967,
    "nscanned" : 66967,
    "nscannedObjectsAllPlans" : 66967,
    "nscannedAllPlans" : 66967,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 523,
    "nChunkSkips" : 0,
    "millis" : 1392,
    "server" : "mongo-shard0003:27018",
    "filterSet" : false,
    "stats" : {
    "type" : "COLLSCAN",
        "works" : 66969,
        "yields" : 523,
        "unyields" : 523,
        "invalidates" : 16,
        "advanced" : 6,
        "needTime" : 66962,
        "needFetch" : 0,
        "isEOF" : 1,
        "docsTested" : 66967,
        "children" : [ ]
},
    "millis" : 1392
}

So far I found only one way - use $in.

db.timeuse.daily.find({"_id": { $in: [
    {"user": 289006, "date": ISODate("2014-10-23T00:00:00Z")},
    {"user": 289006, "date": ISODate("2014-10-24T00:00:00Z")}
]}}).explain()



{
    "cursor" : "BtreeCursor _id_",
    "isMultiKey" : false,
    "n" : 2,
    "nscannedObjects" : 2,
    "nscanned" : 2,
    "nscannedObjectsAllPlans" : 2,
    "nscannedAllPlans" : 2,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "_id" : [
            [
                {
                    "user" : 289006,
                    "date" : ISODate("2014-10-23T00:00:00Z")
                },
                {
                    "user" : 289006,
                    "date" : ISODate("2014-10-23T00:00:00Z")
                }
            ],
            [
                {
                    "user" : 289006,
                    "date" : ISODate("2014-10-24T00:00:00Z")
                },
                {
                    "user" : 289006,
                    "date" : ISODate("2014-10-24T00:00:00Z")
                }
            ]
        ]
    },

If there's a more elegant way to run this kind of query?

mnemosyn · Accepted Answer

TL;DR: Don't put your data in the _id field and use a compound index: db.timeuse.daily.ensureIndex( { "user" : 1, "date": 1 } ).

Explanation: You're abusing the _id key convention, or more precisely the fact that MongoDB can index entire objects. What you want to achieve requires index intersection or a compound index, that is, either two separate indexes that can be combined (that feature is called index intersection and by now, it should be available in MongoDB, but it has limitations) or a special index for the set of keys which in MongoDB is called a compound index.

The _id field is indexed by default, but it's indexed as a whole, i.e. the _id index with only support equality queries on the entire object, rather than parts of the object. That also explains why the $in query works.

In general, that data structure with the default index will behave oddly. Consider this:

> db.sort.insert({"_id" : {"name" : "foo", value : 1} });
> db.sort.insert({"_id" : {"name" : "foo", value : 1, bla : "foo"} });
> db.sort.find();
{ "_id" : { "name" : "foo", "value" : 4343 } }
{ "_id" : { "name" : "foo", "value" : 4343, "bla" : "fooffo" } }

> db.sort.find({"_id" : {"name" : "foo", value : 4343} }); 
{ "_id" : { "name" : "foo", "value" : 4343 } }
// no second result here...

Imagine MongoDB basically hashed the entire object and was simply looking for the object hash - such an index can't support range queries based on some part of the hash.

MongoDB object field and range query index

Answers (1)

Related Questions