Reputation: 1726
I have a collection of products, in which some of the products have the same URL and I can't avoid having those duplicates (The db is populated by a third party).
I'm trying to come up with an efficient query to fetch only distinct products (without getting two products with the same url).
This is what I've come up with so far:
db.product.find({_id: {$in:
db.product.aggregate([
{
$match: {
searchName: /iphone 8.*/i,
isAvailable: true
}
},
{
$group: {
_id: "$url",
id: {$max: "$_id"}
}
}
]).map(a => a.id)
}}).limit(10);
The inner aggregate
will return the IDs of the distinct products, and the outer find
will fetch them.
It works fine, but it takes more than twice the time that a regular query. Is there a better way to do this?
Thanks!
Upvotes: 0
Views: 258
Reputation: 37018
You are making 2 queries - first one to get id
s and the second one to fetch documents by id
s. You can perfectly combine both in a single pipeline:
db.product.aggregate([
{ $sort: {_id: -1} },
{
$match: {
searchName: /iphone 8.*/i,
isAvailable: true
}
},
{
$group: {
_id: "$url",
docs: {$push: "$$ROOT"}
}
},
{ $replaceRoot: { newRoot: { $arrayElemAt: [ "$docs", 0 ] } } }
])
$replaceRoot is available since v3.4. A workaround for this stage to be used in older versions could be
{ $project: {_id: 0, product: { $arrayElemAt: [ "$docs", 0 ] } } }
Upvotes: 1