Chris Dutrow
Chris Dutrow

Reputation: 50372

Can a list field be a shard key in MongoDB?

Have some data that looks like this:

widget:
{
    categories: ['hair', 'nails', 'dress']
    colors:     ['red', 'white']
}

The data needs to be queried like this:

SELECT * FROM widget_table WHERE categories == 'hair' AND colors == 'red'

Would like to put this data into a MongoDB sharded cluster. However, it seems like an ideal shard key would not be a list field. In this case, that is not possible because all of the fields are list fields.

Thanks so much!

Upvotes: 1

Views: 1649

Answers (2)

Stennie
Stennie

Reputation: 65393

Sharding in MongoDB (as at 2.4) works by partitioning your documents into ranges of values based on the shard key. A list or array shard key does not make sense as a shard key because it contains multiple values.

It's also worth noting that the shard key is immutable (cannot be changed once set for a document), so you do not want to choose fields that you intend to update.

If you do not have any candidate fields in your documents, you could always add one. A straightforward solution in your case could be to use the new hashed sharding in MongoDB 2.4:

The field you choose as your hashed shard key should have a good cardinality, or large number of different values. Hashed keys work well with fields that increase monotonically like ObjectId values or timestamps.

An obvious question to consider before sharding is "do you need to shard?". Sharding is an approach for scaling out writes with MongoDB, but can be overkill if you aren't yet pushing the limits of your current configuration.

Upvotes: 1

Chris Dutrow
Chris Dutrow

Reputation: 50372

Based on some of the feed back I am getting that seems to assert that it is not possible to shard using a list field as a shard key, I wanted to illustrate how this use case could be sharded using the limitations of MongoDB:

Original object:

widget:
{
    primary_key: '2389sdjsdafnlfda'

    categories: ['hair', 'nails', 'dress']
    colors:     ['red', 'white']

    #All the other fields in the document that don't need to be queried upon: 
    ...
    ...
}

Data layer splits object into multiple pointer objects based on the number of elements in the field chosen for the shard key:

widget_pointer:
{
    primary_key: '2389sdjsdafnlfda'
    categories: 'hair',
    colors:     ['red', 'white']
}

widget_pointer:
{
    primary_key: '2389sdjsdafnlfda'
    categories: 'nails',
    colors:     ['red', 'white']
}

widget_pointer:
{
    primary_key: '2389sdjsdafnlfda'
    categories: 'dress',
    colors:     ['red', 'white']
}

Explanation:

  • The field categories can now be the shard key in MongoDB.
  • The original object will now be stored in a key-value store. Queries against the data in MongoDB will return a pointer object that will be used to get the object from the key-value store.
  • Queries on the MongoDB data will hit only one shard.
  • Insertions on the MongoDB data will hit as many shards as there are elements in the list, in most cases, only a small subset of the total number of shards will be affected.

Upvotes: 3

Related Questions