Sharding key (MongoDB) for large number documents

Question

I am developing a web application where users will be uploading a large number of documents to the system and different types of operations will be performed on the documents, including aggregation. However the number of documents uploaded by each user varies widely - some might upload a dozen documents, and some might upload a million documents.

documents look something like this:

doc{
    _id: ,
    uid: ,
    ctime: ,
    ....
        
    ....
}

Now here is the problem in choosing the shard key:
1. If I choose the UUID as the shard key, documents uploaded by the same user are unlikely to end up in the same shard and aggregation operations will be costly.
2. If I use uid as the shard key then the data stored in shards will not be even.

Can anyone suggest which is the best way to achieve this?

I am very new to partitioning and sharding and my research on google as well as stack-overflow did not yield anything. I can change the schema of the documents if needed since the project is still at the design phase.

Eve Freeman · Accepted Answer

This is the best guide I've seen on choosing a shard key: http://www.kchodorow.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/

You have to decide how you want to query the data. Perhaps a combination of uid and ctime will yield a good shard key, but I'm not sure if that will cause you grief while querying, as you haven't given much insight on how you plan to query.

Sharding key (MongoDB) for large number documents

Answers (2)

Related Questions