Mongodb group by multiple keys to get unique documents in collection

Question

I have collection and documents are structured in the following format:

[
 {

    "brand" : "Toshiba",
    "title" : "Toshiba Pors 7220CT / NW2",
    "category" : "notebooks",
    "code" : "ABCDTESTASD12",
    "pid" : "45790"
 }, 
 {
    "brand" : "Toshiba",
    "title" : "Toshiba Satellite Pro 4600 PIII800",
    "category" : "notebooks",
    "ean" : "PATDSRESSSN12",
    "pid" : "12345"
 }
]

Could you suggest me the query to find unique documents which have same brand,title,category,code so that I can see unique docs in collection.

chridam · Accepted Answer

Because the aggregation pipeline stages have maximum memory use limit, use the following pipeline which deals with large datasets by setting allowDiskUse option to true thus enables writing data to temporary files. Within the pipeline, use the $match stage to filter out dupes so that you only remain with the unique documents which you can query by _id:

var pipeline = [
    {
        "$group": { /* Group by fields to match on brand, title, category and code */
            "_id": { 
                "brand": "$brand", 
                "title": "$title", 
                "category": "$category", 
                "code": "$code" 
            },
            "count": { "$sum": 1 }, /* Count number of matching docs for the group */
            "docs": { "$push": "$_id" }, /* Save the _id for matching docs */
            "pids": { "$push": "$pid" } /* Save the matching pids to list */
        }
    },
    { "$match": { "count": 1 } }, /* filter out dupes */
    { "$out": "result" } /* Output aggregation results to another collection */
],
options = { "allowDiskUse": true, cursor: {} };

db.products.aggregate(pipeline, options); // Run the aggregation operation

db.result.find(); // Get the unique documents

Mongodb group by multiple keys to get unique documents in collection

Answers (2)

Related Questions