Reputation: 1326
I have collection and documents are structured in the following format:
[
{
"brand" : "Toshiba",
"title" : "Toshiba Pors 7220CT / NW2",
"category" : "notebooks",
"code" : "ABCDTESTASD12",
"pid" : "45790"
},
{
"brand" : "Toshiba",
"title" : "Toshiba Satellite Pro 4600 PIII800",
"category" : "notebooks",
"ean" : "PATDSRESSSN12",
"pid" : "12345"
}
]
Could you suggest me the query to find unique documents which have same brand,title,category,code so that I can see unique docs in collection.
Upvotes: 1
Views: 1393
Reputation: 103365
Because the aggregation pipeline stages have maximum memory use limit, use the following pipeline which deals with large datasets by setting allowDiskUse
option to true thus enables writing data to temporary files.
Within the pipeline, use the $match
stage to filter out dupes so that you only remain with the unique documents which you can query by _id
:
var pipeline = [
{
"$group": { /* Group by fields to match on brand, title, category and code */
"_id": {
"brand": "$brand",
"title": "$title",
"category": "$category",
"code": "$code"
},
"count": { "$sum": 1 }, /* Count number of matching docs for the group */
"docs": { "$push": "$_id" }, /* Save the _id for matching docs */
"pids": { "$push": "$pid" } /* Save the matching pids to list */
}
},
{ "$match": { "count": 1 } }, /* filter out dupes */
{ "$out": "result" } /* Output aggregation results to another collection */
],
options = { "allowDiskUse": true, cursor: {} };
db.products.aggregate(pipeline, options); // Run the aggregation operation
db.result.find(); // Get the unique documents
Upvotes: 0
Reputation: 771
you could use the group operator from the aggregation framework:
db.computers.aggregate(
[
{
$group : {
_id : { brand: "$brand", title: "$title", category: "$category", code: "$code" },
count: { $sum: 1 }
}
}
]
)
Upvotes: 1