Brandt
Brandt

Reputation: 565

Grouping documents by an array, treat array as a set

I have a fairly simple-sounding task I'd like to achieve using MongoDB's aggregation pipeline. I want to treat the arrays in one field as sets (i.e., disregarding order and duplicate), and group by them. As an example, the collection might be:

[
    {
        _id: 1
        names: ["a", "b"]
    },
    {
        _id: 2
        names: ["c", "a"]
    },
    {
        _id: 3
        names: ["b", "a"]
    }
]

And the result I want back is something like:

[
    {
        names: ["a", "b"],
        count: 2
    },
    {
        names: ["a", "c"],
        count: 1
    }
]

Thanks!

Upvotes: 2

Views: 70

Answers (2)

Saleem
Saleem

Reputation: 8978

You can definitely get your result by stitching together multiple aggregation pipelines.

db.collection.aggregate([
 {$unwind:"$names"},
 {$sort:{_id:1, names:1}},
 {$group:{_id:"$_id", names:{$push:"$names"}}},
 {$group:{_id:"$names", count:{$sum:1}}},
 {$project:{_id:0, names:"$_id", count:1}}
])

It emits:

{ 
    "count" : NumberInt(1), 
    "names" : [
        "a", 
        "c"
    ]
}
{ 
    "count" : NumberInt(2), 
    "names" : [
        "a", 
        "b"
    ]
}

Upvotes: 1

Blakes Seven
Blakes Seven

Reputation: 50406

You need to $sort the results to make them consistent for a grouping key. There really is no other way:

db.collection.aggregate([
    { "$unwind": "$names" },
    { "$sort": { "_id": 1, "names": 1} },
    { "$group": {
        "_id": "$_id",
        "names": { "$push": "$names" }
    }},
    { "$group": {
        "_id": "$names",
        "count": { "$sum": 1 }
    }}
])

Returns just like you ask:

[
    {
        "_id": ["a", "b"],
        "count": 2
    },
    {
        "_id": ["a", "c"],
        "count": 1
    }
]

Whilst there are quite a few operators that work on array like "sets", none of them "reorder" the array content into a consistent way that would apply when grouping. This is only ever done when you $sort.

Even if arrays contained "duplicates", and had some set transformation applied they are still not consistently ordered:

db.testa.insert_many([
    { "a" : [ "a", "b" ] },
    { "a" : [ "b", "a" ] },
    { "a" : [ "b", "a", "a" ] }
])

db.testa.aggregate({ "$project": { "_id": 0, "a": { "$setUnion": [ "$a", [] ] } } })

That sample returns of course:

{ "a" : [ "b", "a" ] }
{ "a" : [ "a", "b" ] }
{ "a" : [ "a", "b" ] }

So you would "still" need to $unwind and $sort in order to get a consistent "set" for grouping purposes.

Upvotes: 1

Related Questions