Reputation: 57
I have a doubt regarding Mongo compound shard keys. Let's suppose I have document that is structured like this:
{
"players": [
{
"id": "12345",
"name": "John",
},
{
"id": "23415",
"name": "Doe",
}
]
}
Players embedded documents are always present and always 2. I think that the "players.0.id" and "players.1.id" should be a good choice as shard keys because are not monotonic and are evenly distributed.
What I can't understand from the documentation is if:
In other words, if I query the Collection to get all games played by John (as player 1 or player 2) the query will be sent to one chunk or to all chunks?
Upvotes: 2
Views: 1644
Reputation: 13765
You cannot create a shard key where part of the key is a multikey index (i.e. index on an array field). This is mentioned in Shard Key Index Type:
A shard key index cannot be an index that specifies a multikey index, a text index or a geospatial index on the shard key fields.
If you have exactly two items under the players
field, why not create two sub-documents instead of using an array? An array is typically useful for use cases where you have multiple items of indeterminate number in a document. For example, this structure might work for your use case:
{
"players": {
"player_1": {
"id" : 12345,
"name": "John"
},
"player_2": {
"id": 54321,
"name": "Doe"
}
}
}
You can then create an index like:
> db.test.createIndex({'players.player_1.id':1, 'players.player_2.id':1})
To answer your questions, if you're using this shard key, then:
There is no guarantee that the same player_1.id
and player_2.id
will be on the same chunk. This will depend on your data distribution.
If you query John as player_1 OR player_2
, the query will be sent to all shards. This is because you have a compound index as the shard key, and you're searching for an exact match on the non-prefix field.
To elaborate on question 2:
The query you're doing is this:
db.test.find({$or: [
{'players.player_1.id':123},
{'players.player_2.id':123}
]})
In a compound index, the index was first sorted by player_1.id
, then for each player_1.id
, there exist sorted player_2.id
. For example, if you have 10 documents with some combination of values for player_1.id
and player_2.id
, you can visualize the index like this:
player_1.id | player_2.id
------------|-------------
0 | 10
0 | 123
1 | 100
1 | 123
2 | 123
2 | 150
123 | 10
123 | 100
123 | 123
123 | 150
Note that the value player_2.id: 123
occur multiple times in the table, once per each player_1.id
. Also note that for each player_1.id
value, the player_2.id
values are sorted within it.
This is how MongoDB's compound index works and how it's sorted. There are more nuances with compound indexes that is too long to explain here, but the details are explained in the Compound Indexes page
The effect of this ordering method is that, there are many, many identical player_2.id
values spread across the index. Since the overall index is only sorted in terms of player_1.id
, it is not possible to find an exact player_2.id
without specifying player_1.id
. Hence, the above query will be sent to all shards.
Upvotes: 1