rawat
rawat

Reputation: 165

MongoDB find query in list of dictionaries

I have a collection as below

            {
            "state" : "VIC",
            "sites" : 
            [
                {
                    "name" : "VIC01",
                    "pes" :
                        [
                            {
                                "id" : "1",
                                "longname" : "rdnvej300exh0443",
                                "shortname" : "RVE4-E-043",
                                "location" : "Exhibition"
                            },
                            {
                                "id" : "2",
                                "longname" : "rdnvej160pee0343",
                                "shortname" : "RV3W-E-043",
                                "location" : "Windsor"
                            },
                            {
                                "id" : "3",
                                "location" : "my home"
                            }
                        ],
                    "partners" :
                        [
                            {
                                "id" : "REACH",
                                "fnns" : ["N54321R","N24686R","N46818R","N10461R"]
                            },
                            {
                                "id" : "NCS_CORE",
                                "fnns" : [ "N54320R","N71311R","N35797R","N57919R"]
                            }
                        ]           
                },

                {
                    "name" : "CLAYTON",
                    "pes" :
                        [
                            {
                                "id" : "1",
                                "longname" : "rdnvej1822da0o43",
                                "shortname" : "RVCZ-E-043",
                                "location" : "Clayton"
                            },
                            {
                                "id" : "2",
                                "longname" : "rdnvej1822da0o44",
                                "shortname" : "RVCZ-E-044",
                                "location" : "Clayton"
                            }
                        ],
                    "partners" :
                        [
                            {
                                "id" : "NCS_CORE",
                                "fnns" : ["N54331R","N24686R","N46818R","N10461R"]
                            },
                            {
                                "id" : "NCS_CLAYTON_OPS2",
                                "fnns" : [ "N54321R","N71311R","N35797R","N57919R"]
                            }
                        ]
                }
            ]
            }

I am trying to retrieve

{state, sites.name,sites.partners.id}

from all the documents having "N54321R" in "fnns" list.

I tried below queries but for the above it is giving all sites.partners.id.

1. db.topology.find( {"sites.partners.fnns" : { "$elemMatch" : { "$eq" : "N54321R" } } }, {"state" : 1 , "sites.name" : 1, "sites.partners.id" : 1 , "sites.partners.fnns" : 1} )


2. db.topology.find( {"sites.partners.fnns" : { $in : ["N54321R"] } }, {"state" : 1 , "sites.name" : 1, "sites.partners.id" : 1 , "sites.partners.fnns" : 1} )


3. db.topology.find( {"sites.partners.fnns" : "N54321R"} , {"state" : 1 , "sites.name" : 1, "sites.partners.id" : 1 , "sites.partners.fnns" : 1} )

Correct output should be (just showing sites.partners.id)

{ REACH , NCS_CLAYTON_OPS2 }

but it is giving all sites.partners.id i.e.

{ REACH , NCS_CORE , NCS_CORE , NCS_CLAYTON_OPS2 }

What should I modify in the query to achieve the results?

If schema change required then please tell me what should be the correct schema?

Thanks.

Upvotes: 3

Views: 5766

Answers (2)

Blakes Seven
Blakes Seven

Reputation: 50426

The basic "projection" offered through the .find() method is not able to filter content from arrays in the way you are asking. The positional $ match for more than one element at the outer level or specifying specific elements of subdocuments is just not handled here.

Instead you need .aggregate() in order to filter out the content, so in order to just return those "partners" elements that have the matching value the most efficient way possible at present is this:

Just Solving for the lazy or impatient

db.collection.aggregate([
    { "$match": { "sites.partners.fddn": "N54321R" } },
    { "$project": {
        "state": 1,
        "sites": {
            "$setDifference": [
                { "$map": {
                    "input": "$sites",
                    "as": "site",
                    "in": {
                        "$let": {
                            "vars": {
                                "partners": {
                                    "$setDifference": [
                                        { "$map": {
                                            "input": "$$site.partners",
                                            "as": "partner",
                                            "in": {
                                                "$cond": [
                                                    { "$setIsSubset": [ ["N54321R"], "$$partner.fnns" ] },
                                                    "$$partner",
                                                    false
                                                ]
                                            }
                                        }},
                                        [false]
                                    ]
                                }
                            },
                            "in": {
                                "$cond": [
                                    { "$eq": [{ "$size": "$$partners" }, 0] },
                                    false,
                                    {
                                        "name": "$$site.name",
                                        "pes": "$$site.pes",
                                        "partners": "$$partners"
                                    }
                                ]
                            }
                        }
                    }
                }},
                [false]
            ]
        }
    }}
])

Which will return the filtered result like:

{
    "_id" : ObjectId("564433c2f7a4adee6c13205b"),
    "state" : "VIC",
    "sites" : [
            {
                    "name" : "VIC01",
                    "pes" : [
                            {
                                    "id" : "1",
                                    "longname" : "rdnvej300exh0443",
                                    "shortname" : "RVE4-E-043",
                                    "location" : "Exhibition"
                            },
                            {
                                    "id" : "2",
                                    "longname" : "rdnvej160pee0343",
                                    "shortname" : "RV3W-E-043",
                                    "location" : "Windsor"
                            },
                            {
                                    "id" : "3",
                                    "location" : "my home"
                            }
                    ],
                    "partners" : [
                            {
                                    "id" : "REACH",
                                    "fnns" : [
                                            "N54321R",
                                            "N24686R",
                                            "N46818R",
                                            "N10461R"
                                    ]
                            }
                    ]
            },
            {
                    "name" : "CLAYTON",
                    "pes" : [
                            {
                                    "id" : "1",
                                    "longname" : "rdnvej1822da0o43",
                                    "shortname" : "RVCZ-E-043",
                                    "location" : "Clayton"
                            },
                            {
                                    "id" : "2",
                                    "longname" : "rdnvej1822da0o44",
                                    "shortname" : "RVCZ-E-044",
                                    "location" : "Clayton"
                            }
                    ],
                    "partners" : [
                            {
                                    "id" : "NCS_CLAYTON_OPS2",
                                    "fnns" : [
                                            "N54321R",
                                            "N71311R",
                                            "N35797R",
                                            "N57919R"
                                    ]
                            }
                    ]
            }
    ]
}

Explaination of what is happening for the curious

There are a few things going on here, especially if you are not familiar with the aggregation framework so it's best to step through them.

First there is the $match pipeline stage, which is essentially a "query" just like you issue as the first argument to .find(). The path used there will select documents that acutally have the matching element:

    { "$match": { "sites.partners.fddn": "N54321R" } },

So sites.partners.fddn is the "path" to match correctly for the value you are looking for. "Dot notation" is the correct form for this at this stage. It matches a document, but looking at the positional $ operator the array match is really only counted as the element of sites, and just the first one where a match occurred. So that is why this "projection" is of no use for your purpose, but you still should $match to narrow down the documents you actually want.

Then there is $project, which is an extended form of what can be done with standard projection and actually allows complete document manipulation that cannot be done with .find().

Within this you have compounded arrays, and these are what you need to inspect and filter. The tools for this are $map and $setDifference respectively.

Essentially, $map looks at each element of the array and allows conditions to be processed against each element. In this case the operation you want is $cond which is a ternary operator. It returns a value depending on the condition in it's first argument, being either the second argument where true or the third argument where false.

If you first look at the innermost part of this:

{ "$map": {
    "input": "$$site.partners",
    "as": "partner",
    "in": {
        "$cond": [
            { "$setIsSubset": [ ["N54321R"], "$$partner.fnns" ] },
            "$$partner",
            false
        ]
    }
}},

This is looking at each element of the inner "partners" array and doing another comparison on it's "ffns" array. Speeding this up a bit is $setIsSubset, which compares your value ( specified as an array/set itself ) to see if it is actually a "subset" ( contained member(s) ) of the array/set being tested. This is another logical true/false result also fed to a $cond, which is either returning the element or false where the condition was not met.

The $setDifference shown in various places is always doing the same job, which is since $map is just returning either wanted values of the false result for each element, this comparison removes all false elements, just leaving behind the content with matching values.

Finally there is $let which allows declaration of variables to avoid repeated declarations of logic. Though it does not apply to your data in this sample, it is a possibility that the resulting filtering of "partners" could in fact be an empty array. This would generally not be desired in the results.

Therefore in the $map processing of the "sites" array, we work out each filtered result for "partners" first, then test that result to see if it actually has some length. If it does not, then a similar false is returned, otherwise the content from the element is returned "with" the filtered version of "partners" in place of the original array for each member. The same filtering of $false is then applied.

It can be a bit to take in, but worth studying and understanding. It is also possible to process using stages with $unwind, but should be avoided since that would have a significant performance impact. However the aggregation pipeline as shown here has basically the same performance characteristics of any standard query, with very little additional overhead.

Upvotes: 6

Michel Floyd
Michel Floyd

Reputation: 20256

Your collection only has one document. It has a complex substructure with nested arrays of objects but db.topology.find() is always going to return either 0 or 1 document on this collection. That document is going to include all the elements of all the nested arrays.

Think of $elemMatch as find me all the complete documents where any of the array elements contain the value specified. That's very different from what you're trying to do which is find me all the array elements from a document that contain the value specified and exclude the ones that do not.

To be able to search partners independently you can create separate collections for sites, pes, and partners and have pes and partners include keys that reference the parent site.

Upvotes: 0

Related Questions