Aravind Kumar Anugula
Aravind Kumar Anugula

Reputation: 1326

Mongodb group by multiple keys to get unique documents in collection

I have collection and documents are structured in the following format:

[
 {

    "brand" : "Toshiba",
    "title" : "Toshiba Pors 7220CT / NW2",
    "category" : "notebooks",
    "code" : "ABCDTESTASD12",
    "pid" : "45790"
 }, 
 {
    "brand" : "Toshiba",
    "title" : "Toshiba Satellite Pro 4600 PIII800",
    "category" : "notebooks",
    "ean" : "PATDSRESSSN12",
    "pid" : "12345"
 }
]

Could you suggest me the query to find unique documents which have same brand,title,category,code so that I can see unique docs in collection.

Upvotes: 1

Views: 1393

Answers (2)

chridam
chridam

Reputation: 103365

Because the aggregation pipeline stages have maximum memory use limit, use the following pipeline which deals with large datasets by setting allowDiskUse option to true thus enables writing data to temporary files. Within the pipeline, use the $match stage to filter out dupes so that you only remain with the unique documents which you can query by _id:

var pipeline = [
    {
        "$group": { /* Group by fields to match on brand, title, category and code */
            "_id": { 
                "brand": "$brand", 
                "title": "$title", 
                "category": "$category", 
                "code": "$code" 
            },
            "count": { "$sum": 1 }, /* Count number of matching docs for the group */
            "docs": { "$push": "$_id" }, /* Save the _id for matching docs */
            "pids": { "$push": "$pid" } /* Save the matching pids to list */
        }
    },
    { "$match": { "count": 1 } }, /* filter out dupes */
    { "$out": "result" } /* Output aggregation results to another collection */
],
options = { "allowDiskUse": true, cursor: {} };

db.products.aggregate(pipeline, options); // Run the aggregation operation

db.result.find(); // Get the unique documents

Upvotes: 0

AleFranz
AleFranz

Reputation: 771

you could use the group operator from the aggregation framework:

db.computers.aggregate(
    [
        {
            $group : {
                _id : { brand: "$brand", title: "$title", category: "$category", code: "$code" },
                count: { $sum: 1 }
            }
        }
    ]
)

Upvotes: 1

Related Questions