Andrea Morandini
Andrea Morandini

Reputation: 57

MongoDB compound shard key

I have a doubt regarding Mongo compound shard keys. Let's suppose I have document that is structured like this:

{
   "players": [
      {
        "id": "12345",
        "name": "John",
      },
      {
        "id": "23415",
        "name": "Doe",
      }
   ]
}

Players embedded documents are always present and always 2. I think that the "players.0.id" and "players.1.id" should be a good choice as shard keys because are not monotonic and are evenly distributed.

What I can't understand from the documentation is if:

  1. All documents with same "players.0.id" OR same "players.1.id" are supposed to be saved into the same Chunk, or
  2. All documents with same "players.0.id" AND same "players.1.id" are supposed to be saved into the same Chunk.

In other words, if I query the Collection to get all games played by John (as player 1 or player 2) the query will be sent to one chunk or to all chunks?

Upvotes: 2

Views: 1644

Answers (1)

kevinadi
kevinadi

Reputation: 13765

You cannot create a shard key where part of the key is a multikey index (i.e. index on an array field). This is mentioned in Shard Key Index Type:

A shard key index cannot be an index that specifies a multikey index, a text index or a geospatial index on the shard key fields.

If you have exactly two items under the players field, why not create two sub-documents instead of using an array? An array is typically useful for use cases where you have multiple items of indeterminate number in a document. For example, this structure might work for your use case:

{
    "players": {
        "player_1": {
            "id" : 12345,
            "name": "John"
        },
        "player_2": {
            "id": 54321,
            "name": "Doe"
        }
    }
}

You can then create an index like:

> db.test.createIndex({'players.player_1.id':1, 'players.player_2.id':1})

To answer your questions, if you're using this shard key, then:

  1. There is no guarantee that the same player_1.id and player_2.id will be on the same chunk. This will depend on your data distribution.

  2. If you query John as player_1 OR player_2, the query will be sent to all shards. This is because you have a compound index as the shard key, and you're searching for an exact match on the non-prefix field.

To elaborate on question 2:

The query you're doing is this:

db.test.find({$or: [
    {'players.player_1.id':123},
    {'players.player_2.id':123}
]})

In a compound index, the index was first sorted by player_1.id, then for each player_1.id, there exist sorted player_2.id. For example, if you have 10 documents with some combination of values for player_1.id and player_2.id, you can visualize the index like this:

player_1.id | player_2.id
------------|-------------
0           | 10
0           | 123
1           | 100
1           | 123
2           | 123
2           | 150
123         | 10
123         | 100
123         | 123
123         | 150

Note that the value player_2.id: 123 occur multiple times in the table, once per each player_1.id. Also note that for each player_1.id value, the player_2.id values are sorted within it.

This is how MongoDB's compound index works and how it's sorted. There are more nuances with compound indexes that is too long to explain here, but the details are explained in the Compound Indexes page

The effect of this ordering method is that, there are many, many identical player_2.id values spread across the index. Since the overall index is only sorted in terms of player_1.id, it is not possible to find an exact player_2.id without specifying player_1.id. Hence, the above query will be sent to all shards.

Upvotes: 1

Related Questions