Jit B
Jit B

Reputation: 1246

Sharding key (MongoDB) for large number documents

I am developing a web application where users will be uploading a large number of documents to the system and different types of operations will be performed on the documents, including aggregation. However the number of documents uploaded by each user varies widely - some might upload a dozen documents, and some might upload a million documents.

documents look something like this:

doc{
    _id: <self generated UUID>,
    uid: <id of user who uploaded the document>,
    ctime: <creation timestamp>,
    ....
        <other attributes, etc>
    ....
}

Now here is the problem in choosing the shard key:
1. If I choose the UUID as the shard key, documents uploaded by the same user are unlikely to end up in the same shard and aggregation operations will be costly.
2. If I use uid as the shard key then the data stored in shards will not be even.

Can anyone suggest which is the best way to achieve this?

I am very new to partitioning and sharding and my research on google as well as stack-overflow did not yield anything. I can change the schema of the documents if needed since the project is still at the design phase.

Upvotes: 1

Views: 1715

Answers (2)

deepakmodak
deepakmodak

Reputation: 1339

You can read more on shardkey selection and scaling

1] Kristina Chodrow's book "Scaling MongoDB" http://shop.oreilly.com/product/0636920018308.do

2]Antoine Girbal's presentation on Sharding Best Practices http://www.10gen.com/presentations/MongoNYC-2012/Sharding-Best-Practices-Advanced

Upvotes: 1

Eve Freeman
Eve Freeman

Reputation: 33155

This is the best guide I've seen on choosing a shard key: http://www.kchodorow.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/

You have to decide how you want to query the data. Perhaps a combination of uid and ctime will yield a good shard key, but I'm not sure if that will cause you grief while querying, as you haven't given much insight on how you plan to query.

Upvotes: 3

Related Questions