motormal
motormal

Reputation: 178

Proper Implementation of Hashed Shard Key In MongoDB

I have a collection that is currently indexed/queried by the built-in "_id" (ObjectId). I don't want to shard on this key since it is sequential (date-prefixed). The documentation for Mongo 2.4 says that I can shard on a hash of this key, which sounds great. Like so:

sh.shardCollection( "records.active", { _id: "hashed" } )

Question: do I have to first create the hashed index on the active collection with:

db.active.ensureIndex({ _id: "hashed" })

Or is that not necessary? I don't want to waste space with more indexing than is necessary.

Related question: if I do create a hashed index with ensureIndex({ _id: "hashed"}) can I drop the default "id" index? Will Mongo know to take queries on the _id field, hash them and run them against the hashed index?

Thanks...

Upvotes: 6

Views: 3205

Answers (2)

rendybjunior
rendybjunior

Reputation: 602

I have tried by my self, using mongoDB 2.4.11.

I create and insert docs to a new collection. Query was fired to mongos server. All 1,000,000 docs I inserted goes to shard A as shard cluster primary (you can check it using sh.status()).

However, when I tried to do command to shard collection as per below,

sh.shardCollection("database.collection",{_id:"hashed"})

it shows error as following

{
    "proposedKey" : {
        "_id" : "hashed"
    },
    "curIndexes" : [
        {
            "v" : 1,
            "name" : "_id_",
            "key" : {
                "_id" : 1
            },
            "ns" : "database.collection"
        }
    ],
    "ok" : 0,
    "errmsg" : "please create an index that starts with the shard key before sharding."
}

So the answer is

  1. Yes it needs hashed index
  2. You have to create it beforehand, MongoDB requires you to do it manually using command below:

    db.collection.ensureIndex( { _id: "hashed" } )

Upvotes: 1

James Wahlin
James Wahlin

Reputation: 2821

Both the _id index and the hashed _id index will be needed. In MongoDB 2.4 you do not have to explicitly call db.active.ensureIndex({ _id: "hashed" }) before sharding your collection, but if you don't the sh.shardCollection( "records.active", { _id: "hashed" } ) will create the hashed index for you.

The _id index is required for replication.

To shard a collection in MongoDB you have to have an index on the shard key. This has not changed in MongoDB 2.4 and the hashed _id index will be required for sharding to work.

Upvotes: 3

Related Questions