Alexandre Justino
Alexandre Justino

Reputation: 1726

MongoDB distinct query

I have a collection of products, in which some of the products have the same URL and I can't avoid having those duplicates (The db is populated by a third party).

I'm trying to come up with an efficient query to fetch only distinct products (without getting two products with the same url).

This is what I've come up with so far:

db.product.find({_id: {$in: 
  db.product.aggregate([
    {
        $match: {
            searchName: /iphone 8.*/i,
            isAvailable: true
        }
    },
    {
        $group: {
            _id: "$url",
            id: {$max: "$_id"}
        }
    }
  ]).map(a => a.id)
}}).limit(10);

The inner aggregate will return the IDs of the distinct products, and the outer find will fetch them.

It works fine, but it takes more than twice the time that a regular query. Is there a better way to do this?

Thanks!

Upvotes: 0

Views: 258

Answers (1)

Alex Blex
Alex Blex

Reputation: 37018

You are making 2 queries - first one to get ids and the second one to fetch documents by ids. You can perfectly combine both in a single pipeline:

db.product.aggregate([
{ $sort: {_id: -1} },
{
    $match: {
        searchName: /iphone 8.*/i,
        isAvailable: true
    }
},
{
    $group: {
        _id: "$url",
        docs: {$push: "$$ROOT"}
    }
},
{ $replaceRoot: { newRoot: { $arrayElemAt: [ "$docs", 0 ] } } }
])

$replaceRoot is available since v3.4. A workaround for this stage to be used in older versions could be

{ $project: {_id: 0, product: { $arrayElemAt: [ "$docs", 0 ] } } }

Upvotes: 1

Related Questions