Hosre Thinker
Hosre Thinker

Reputation: 53

Intersection of several arrays

I have some documents having a array protperty Items. I want to get the intercept between n docuements.

db.things.insert({name:"A", items:[1,2,3,4,5]})
db.things.insert({name:"B", items:[2,4,6,8]})
db.things.insert({name:"C", items:[1,2]})
db.things.insert({name:"D", items:[5,6]})
db.things.insert({name:"E", items:[9,10]})
db.things.insert({name:"F", items:[1,5]})

Data:

{ "_id" : ObjectId("57974a0d356baff265710a1c"), "name" : "A", "items" : [ 1, 2, 3, 4, 5 ] },
{ "_id" : ObjectId("57974a0d356baff265710a1d"), "name" : "B", "items" : [ 2, 4, 6, 8 ] },
{ "_id" : ObjectId("57974a0d356baff265710a1e"), "name" : "C", "items" : [ 1, 2 ] },
{ "_id" : ObjectId("57974a0d356baff265710a1f"), "name" : "D", "items" : [ 5, 6 ] },
{ "_id" : ObjectId("57974a0d356baff265710a20"), "name" : "E", "items" : [ 9, 10 ] },
{ "_id" : ObjectId("57974a1a356baff265710a21"), "name" : "F", "items" : [ 1, 5 ] }

For example: things.mane.A intercept things.mane.C intercept things.mane.F:

[ 1, 2, 3, 4, 5 ] intercept [ 1, 2 ] intercept [ 1, 5 ]

Must be: [1]

I think that it's doable using $setIntersectionbut I can't find the way.

I can do it with two documents but how to do it with more ?

    db.things.aggregate({$match:{"name":{$in:["A", "F"]}}}, 
    {$group:{_id:null, "setA":{$first:"$items"}, "setF":{$last:"$items"} } },
    {
            "$project": { 
                "set1": 1, 
                "set2": 1, 
                "commonToBoth": { "$setIntersection": [ "$setA", "$setF" ] }, 
                "_id": 0 
            }
        }
    )

{ "commonToBoth" : [ 5, 1 ] }

Upvotes: 3

Views: 387

Answers (2)

Bertrand Martel
Bertrand Martel

Reputation: 45352

If your are using mongo 3.2, you could use arrayElemAt to precise all arguments of $setIntersection :

db.things.aggregate([{
    $match: {
        "name": {
            $in: ["A", "B", "C"]
        }
    }
}, {
    $group: {
        _id: 0,
        elements: {
            $push: "$items"
        }
    }
}, {
    $project: {
        intersect: {
            $setIntersection: [{
                "$arrayElemAt": ["$elements", 0]
            }, {
                "$arrayElemAt": ["$elements", 1]
            }, {
                "$arrayElemAt": ["$elements", 2]
            }]
        },
    }
}]);

You would have to dynamically add the require number of JsonObject with index such as :

{
    "$arrayElemAt": ["$elements", <index>]
}

It should match with the number of elements of your input items in ["A", "B", "C"]

If you want to deal with duplicates (some name are present multiple time), regroup all your items by name, $unwind twice and $addToSet to merge all array for a specific $name before executing the previous aggregation :

db.things.aggregate([{
    $match: {
        "name": {
            $in: ["A", "B", "C"]
        }
    }
}, {
    $group: {
        _id: "$name",
        "items": {
            "$push": "$items"
        }
    }
}, {
    "$unwind": "$items"
}, {
    "$unwind": "$items"
}, {
    $group: {
        _id: "$_id",
        items: {
            $addToSet: "$items"
        }
    }
}, {
    $group: {
        _id: 0,
        elements: {
            $push: "$items"
        }
    }
}, {
    $project: {
        intersect: {
            $setIntersection: [{
                "$arrayElemAt": ["$elements", 0]
            }, {
                "$arrayElemAt": ["$elements", 1]
            }, {
                "$arrayElemAt": ["$elements", 2]
            }]
        },
    }
}]);

It isn't a clean solution but it works

Upvotes: 1

DAXaholic
DAXaholic

Reputation: 35338

A solution which is not specific to the number of input items could look like so:

db.things.aggregate(
    {
        $match: {
            "name": {
                $in: ["A", "F"]
            }
        }
    },
    {
        $group: {
            _id: "$items",
            count: {
                $sum: 1
            }
        }
    },
    { 
        $group: { 
            _id: null,
            totalCount: {
                $sum: "$count"
            },
            items: {
                $push: "$_id"
            }
        }
    },
    {
        $unwind: {
            path: "$items"
        }
    },
    { 
        $unwind: {
            path: "$items"
        }
    },
    { 
        $group: {
            _id: "$items",
            totalCount: {
                $first: "$totalCount"
            },            
            count: { 
                $sum: 1
            }
        }
    },
    {
        $project: {
            _id: 1,
            presentInAllDocs: {
                $eq: ["$totalCount", "$count"]
            }
        }
    },
    {
        $match: {
            presentInAllDocs: true
        }
    },
    {
        $group: {
            _id: null,
            items: {
                $push: "$_id"
            }
        }
    }
)

which will output this

{
    "_id" : null,
    "items" : [ 
        5, 
        1
    ]
}

Of course you can add a last $project stage to bring the result into the desired shape.


Explanation

The basic idea behind this is that when we count the number of documents and we count the number of occurrences of each item, then the items with a count equal to the total document count appeared in each document and are therefore in the intersection result.
This idea has one important assumption: your items arrays have no duplicates in it (i.e. they are sets). If this assumption is wrong, then you would have to insert an additional stage at the beginning of the pipeline to turn the arrays into sets.
One could also build this pipeline in a different and probably shorter way but I tried to keep the resource usage as low as possible and therefore added possibly unnecessary (from the functional point of view) stages. For example, the second stage groups by the items array as my assumption is that there are far fewer different values/arrays than documents so the rest of the pipeline has to work with a fraction of the initial document count. However, from the functional point of view, we just need the total count of documents and therefore we could skip that stage and just make a $group stage counting all documents and pushing them into an array for later usage - which of course is a big hit for memory consumption as we have now an array of all possible documents.

Upvotes: 2

Related Questions