Matthew Moisen
Matthew Moisen

Reputation: 18299

MongoDB How to get distinct list of sub-documents where the sub-documents' attribute is equal to some value?

db.test.insert( 
    {
        'name': 'outer',
        'foos': [
            {
                'name': 'a',
                'type': 'bar',
            },
            {
                'name': 'a',
                'type': 'bar',
            },
            {
                'name': 'z',
                'type': 'baz',
            },
            {
                'name': 'z',
                'type': 'baz',
            },
        ]
    }
)

How can I obtain a distinct list of foos where foo.type is equal to bar?

I want to find:

[
    {
        'name': 'a',
        'type': 'bar'
    }
]

The following does not work but instead returns a distinct value for all of the foos.

db.test.distinct('foos', {'foos.type': 'bar'})

Upvotes: 3

Views: 1966

Answers (1)

Blakes Seven
Blakes Seven

Reputation: 50416

Yes, Oops! A bit of misunderstanding of the function here. Here's what it returns and I'll explain why:

[
        {
                "name" : "a",
                "type" : "bar"
        },
        {
                "name" : "z",
                "type" : "baz"
        }
]

So the other entry there is "type" equal to "baz", and both are now "distinct", but you missed what you really asked.

You did ask for distinct "foos", which is right. But you also only asked to get that from the "documents" that had an array entry with "type" equal to "bar". This does not "filter" the content to only those array entries, hence why you get the other result.

So you need to "filter" the content before obtaining the "distinct" values. You can only really do that with the .aggregate() method. And $filter is the best method here:

db.test.aggregate([
    // Match documents
    { "$match": { "foos.type": "bar" } },

    // Pre-filter the array
    { "$project": {
        "foos": {
            "$filter": {
                "input": "$foos",
                "as": "el",
                "cond": {
                    "$eq": [ "$$el.type", "bar" ]
                }
            }
        }
     }},

     // Unwind the array
     { "$unwind": "$foos" },

     // Group distinct
     { "$group": {
         "_id": "$foos"
     }}
])

Or in older than MongoDB 3.2, but version 2.6 and above you can alternately use $map with $setDifference:

db.test.aggregate([
    // Match documents
    { "$match": { "foos.type": "bar" } },

    // Pre-filter the array
    { "$project": {
        "foos": {
            "$setDifference": [
                { "$map": {
                    "input": "$foos",
                    "as": "el",
                    "in": {
                        "$cond": [
                            { "$eq": [ "$$el.type", "bar" ] },
                            "$$el",
                            false
                        ]
                    }
                }}
            ]
        }
     }},

     // Unwind the array
     { "$unwind": "$foos" },

     // Group distinct
     { "$group": {
         "_id": "$foos"
     }}
])

That does the same thing, with $map processing each array element and returning either the mactched element or false and $setDiffernce removing the false ones:

And finally in anything older than 2.6:

db.test.aggregate([
    // Match documents
    { "$match": { "foos.type": "bar" } },

     // Unwind the array
     { "$unwind": "$foos" },

     // Filter the denormalized array
     { "$match": { "foos.type": "bar" } },

     // Group distinct
     { "$group": {
         "_id": "$foos"
     }}
])

The general principle is to only leave the array entries behind that match "type" equal to "bar", and ideal to "pre-filter" the array before using $unwind to reduce the work that needs to be processed, as filtering out afterwards would essentially create a new document for every array entry, whether it is a match or not.

At any rate, at some point you need to "de-normalize" the array entries with $unwind and then $group back with "foos" ( sub-document) as the primary key value.

It's not a plain "array" as .distinct() delivers, but is basically how you just "weed out" the array entries you don't want to consider.

That's the thing to remember, as normal query operations do not "filter" array elements, then similarly the query input to .distinct() also does not do this, where removing those elements from consideration is what you intend to do.

Upvotes: 4

Related Questions