Reputation: 433
Just referring to Mongodb aggregation link https://docs.mongodb.com/v3.2/aggregation/#aggregation-pipeline and it mentions that
"The aggregation pipeline can operate on a sharded collection."
Please lemme know If database is sharded then all the collections in the database will be sharded. Also Please confirm that if sharded the aggregate query will be run in many servers, and delivery the results fast. If so how the aggregation query functions.
Regards
Kris
Upvotes: 2
Views: 3257
Reputation: 3160
A sharded cluster always has a Primary Shard, and one or more secondary shards.
Please lemme know If database is sharded then all the collections in the database will be sharded
No, by default none of your collections will be sharded. All such collections stay wholly on the primary shard. To shard a collection, use the shardCollection command
Also Please confirm that if sharded the aggregate query will be run in many servers, and delivery the results fast.
One important thing while defining a collection in a sharded environment is the shard key. You should ensure that choose a good shard key which is responsible for distribution of data across the shards. Thus, if you choose a good shard key, you can expect a better performance than a non-sharded environment.
If so how the aggregation query functions.
The aggregation query is split by the $match
to different shards depending on where the docs are present, and are finally merged together on a shard. A good read is https://docs.mongodb.com/v3.2/core/aggregation-pipeline-sharded-collections/
Upvotes: 2
Reputation: 5466
The aggregation pipeline supports operations on sharded collections.
If the pipeline starts with an exact $match on a shard key, the entire pipeline runs on the matching shard only. Previously(prior to version 3.2), the pipeline would have been split into two parts, and the work of merging it would have to be done on the primary shard.
In the case of aggregation operations that must run on multiple shards, if the operations do not require running on the database’s primary shard, these operations will route the results to a random shard to merge the results to avoid overloading the primary shard for that database. The $out stage and the $lookup stage require running on the database’s primary shard.
When splitting the aggregation pipeline into two parts, the pipeline is split to ensure that the shards perform as many stages as possible with consideration for optimization.
Reference: https://docs.mongodb.com/manual/core/aggregation-pipeline-sharded-collections/
Upvotes: 3