Gabriel Sartori
Gabriel Sartori

Reputation: 33

Count pair words for all combinations

I have a database something like that in MongoDB:

{ "_id" : "piramidales", "LiciList" : [ "318081", "318157" ] }
{ "_id" : "pyramidalis", "LiciList" : [ "318081", "318157" ] }
{
        "_id" : "toneis",
        "LiciList" : [
                "318077",
                "318151",
                "318288",
                "318360",
                "318666"
        ]

I want to count pair words for all combinations!

How can I get the relationship of LiciList item? Like this:

{item1:'piramidales',item2:'pyramidalis',count:2},
{item1:'piramidales',item2:'toneis',count:0},
{item1:'pyramidalis',item2:'toneis',count:0}

Upvotes: 0

Views: 99

Answers (1)

mickl
mickl

Reputation: 49985

You can try following aggregation:

db.col.aggregate([
    {
        $group: {
            _id: null,
            item1: { $push: "$$ROOT" },
            item2: { $push: "$$ROOT" },
        }
    },
    { $unwind: "$item1" },
    { $unwind: "$item2" },
    { 
        $project: { 
            _id: 0,
            item1: "$item1._id", 
            item2: "$item2._id",
            count: { $size: { $setIntersection: [ "$item1.LiciList", "$item2.LiciList" ] } }
        } 
    },
    {
        $redact: {
            $cond: {
               if: { $and: [{ $gt: [ "$item2", "$item1" ] }, { $gt: [ "$count", 0 ] } ] },
               then: "$$KEEP",
               else: "$$PRUNE"
            }
       }
    }
],
{ allowDiskUse: true })

Basically you have to generate the documents having pairs (item1, item2) and that's why we have to group everything into one document with two fields and then unwind twice. To count matching elements we can use $setIntersection. Then we have to filter out the duplicates using $redact. Simply comparing strings using $gt will eliminate pairs like (toneis, toneis) or (toneis, pyramidalis) keeping (pyramidalis, toneis).

Upvotes: 1

Related Questions