Reputation: 361
I have a database query with pymongo , like so
pipeline = [
{"$group": {"_id": "$product", "count": {"$sum": 1}}},
]
rows = list(collection_name.aggregate(pipeline))
print(rows)
The result like to
[
{'_id': 'p1', 'count': 45},
{'_id': 'p2', 'count': 4},
{'_id': 'p3', 'count': 96},
{'_id': 'p1', 'count': 23},
{'_id': 'p4', 'count': 10}
]
Objective:
On the basis of the above results, I want to conduct statistics between partitions. For example, To get the number of count times in the following intervals:
partition, count
(0, 10], 2
[11, 50), 2
[50, 100], 1
Is there a way of doing this entirely using MongoDB aggregate framework?
Any comments would be very helpful. Thanks.
Answer by @Wernfried Domscheit
$bucket
pipeline = [
{"$group": {"_id": "$product", "count": {"$sum": 1}}},
{"$bucket": {
"groupBy": "$count",
"boundaries": [0, 11, 51, 100],
"default": "Other",
"output": {
"count": {"$sum": 1},
}
}}
]
rows = list(tbl_athletes.aggregate(pipeline))
rows
$bucketAuto
pipeline = [
{"$group": {"_id": "$product", "count": {"$sum": 1}}},
{"$bucketAuto": {
"groupBy": "$count",
"buckets": 5,
"output": {
"count": {"$sum": 1},
}
}}
]
rows = list(tbl_athletes.aggregate(pipeline))
rows
NOTICE:
In $bucket
,default
must be there.
Upvotes: 0
Views: 605
Reputation: 59436
Yes, you have the $bucket operator for that:
db.collection.aggregate([
{
$bucket: {
groupBy: "$count",
boundaries: [0, 11, 51, 100],
output: {
count: { $sum: 1 },
}
}
}
])
Or use $bucketAuto where the intervals are generated automatically.
Upvotes: 2