amaatouq
amaatouq

Reputation: 2337

Getting rid of _id in mongodb collection

I know it is not possible to remove the _id field in a mongodb collection. However, the size of my collections is large, that the index on the _id field prevents me from loading the other indices in the RAM. My machine has 125GB of RAM and my collection stats is as follows:

 db.call_records.stats()
{
    "ns" : "stc_cdrs.call_records",
    "count" : 1825338618,
    "size" : 438081268320,
    "avgObjSize" : 240,
    "storageSize" : 468641284752,
    "numExtents" : 239,
    "nindexes" : 3,
    "lastExtentSize" : 2146426864,
    "paddingFactor" : 1,
    "systemFlags" : 0,
    "userFlags" : 1,
    "totalIndexSize" : 165290709024,
    "indexSizes" : {
        "_id_" : 73450862016,
        "caller_id_1" : 45919923504,
        "receiver_id_1" : 45919923504
    },
    "ok" : 1
}

When I do a query like the following:

db.call_records.find({ "$or" : [ { "caller_id": 125091840205 }, { "receiver_id" : 125091840205 } ] }).explain()
{
    "clauses" : [
        {
            "cursor" : "BtreeCursor caller_id_1",
            "isMultiKey" : false,
            "n" : 401,
            "nscannedObjects" : 401,
            "nscanned" : 401,
            "scanAndOrder" : false,
            "indexOnly" : false,
            "nChunkSkips" : 0,
            "indexBounds" : {
                "caller_id" : [
                    [
                        125091840205,
                        125091840205
                    ]
                ]
            }
        },
        {
            "cursor" : "BtreeCursor receiver_id_1",
            "isMultiKey" : false,
            "n" : 383,
            "nscannedObjects" : 383,
            "nscanned" : 383,
            "scanAndOrder" : false,
            "indexOnly" : false,
            "nChunkSkips" : 0,
            "indexBounds" : {
                "receiver_id" : [
                    [
                        125091840205,
                        125091840205
                    ]
                ]

it takes more than 15 seconds on average to return the results. The indices for both caller_id and receiver_id should be around 90GB, which is OK. However, the 73GB index on the _id makes this query very slow.

Upvotes: 0

Views: 985

Answers (2)

Jeff S
Jeff S

Reputation: 7484

There are a lot of potential issues here.

The first is that your indexes do not include all of the data returned. This means Mongo is getting the _id from the index and then using the _id to retrieve and return the document in question. So removing the _id index, even if you could, would not help.

Second, the query includes an OR. This forces Mongo to load both indexes so that it can read them and then retrieve the documents in question.

To improve performance, I think you have just a few choices:

  1. Add the additional elements to the indexes and restrict the data returned to what is available in the index (this would change indexOnly = true in the explain results)
  2. Explore sharding as Skooppa.com mentioned.
  3. Rework the query and/or the document to eliminate the OR condition.

Upvotes: 1

Salvador Dali
Salvador Dali

Reputation: 222531

You correctly told that you can not remove _id field from your document. You also can not remove an index from this field, so this is something you have to live with.

For some reason you start with the assumption that _id index makes your query slow, which is completely unjustifiable and most probably is wrong. This index is not used and just stays there untouched.

Few things I would try to do in your situation:

  • You have 400 billion documents in your collection, have you thought that this is a right time to start sharding your database? In my opinion you should.

  • use explain with your query to actually figure out what slows it down.

Looking at your query, I would also try to do the following: change your document from

{
  ... something else ...
  receiver_id: 234,
  caller_id: 342
}

to

{
   ... something else ...
   participants: [342, 234]
}

where your participants are [caller_id, receiver_id] in this order, then you can put only one index on this field. I know that it will not make your indices smaller, but I hope that because you will not use $or clause, you will get results faster. P.S. if you will do this, do not do this in production, test whether it give you a significant improvement and only then change in prod.

Upvotes: 6

Related Questions