Kyaw Minn
Kyaw Minn

Reputation: 35

Get Document array counts by conditions, then join to another collection

I have two sample collections

  1. Govenor Info
  2. Region Info.

I want to:

  1. display regions info
  2. count regions and cities
  3. join govenor info

I try 3 methods at the button of question. Each one needs one of my requirements. Hope someone can fill it.

NOTE. you can say why not govenor info in the same collection regionInfo. I have reason that "govenor" collection is connected with other collections as well. So, I need separate "govenor" collection.

govenor

[{
    "_id" : "007",
    "name" : {
        "fname" : "Joe",
        "lname" : "McDee",
            }
},
{
    "_id" : "008",
    "name" : {
        "fname" : "Martin",
        "lname" : "Neilwel",
             }
}]

regionInfo

{
  "country_id" : 328,
  "country_name" : "UK",
 }
  "regions" : [{

    "region_id" : 001,
    "region_name" : "west midlands",
    "govenor_id" : "007"
  },
                {
    "region_id" : 002,
    "region_name" : "east midlands"
    "govenor_id" : "008"
                }
  ],
   "cities" : [{
      "city_id" : 1,
      "region_id" : 001,
      "city_name" : "Birmingham"
    }, {
      "city_id" : 2,
      "region_id" : 001,
      "city_name" : "coventry"
    }, {
      "city_id" : 3,
      "region_id" : 001,
      "city_name" : "warsaw"
    },  {
      "city_id" : 4,
      "region_id" : 002,
      "city_name" : "dudley"
    },
     {
      "city_id" : 5,
      "region_id" : 002,
      "city_name" : "blabla"
    }]
}

Desired Result:

{ 
  "_id" : { "country_name" : "UK", "number_of_region" : 2 },
  "regions" : [
    {
      "region_name" : "west midlands",
      "govenor.fname" : "Joe",
      "cities_in_region" : 3
    },
    {
      "region_name" : "east midlands",
      "govenor.fname" : "Martin",
      "cities_in_region" : 2 
    }
  ]
}

I cannot find the count of cities with this:

db.collection('regionInfo').aggregate( [ 
  { '$unwind': "regionInfo" },  
  { '$lookup':{
    from: "govenor",
    localField: 'regionInfo.govenor_id',
    foreignField: "_id",
    as: "Gjoin"
  }},
  { '$lookup': {
    from: "regionInfo",
    localField: 'regions.region_id',
    foreignField: "cities.gevenor_id",
    as: "govjoin"
  }},
  { '$project' : {
    R_id: "$regionInfo.region_name",
    govenor_fname:"$Gjoin..name.fname",
    govenor_lname:"$Gjoin.name.lname",
  }}
])

If I Use mapReduce, I can count cities(not right format) and not join govenor info

    map = function(){
    for (var index = 0; index < this.regions.length; ++index) {
            var auth = this.regions[ index ];
            emit( auth._id ,{cmt : {'this.cities.region_id' :'this.regions.region_id'}} );
        }}
    reduce = function(auth, counters) {
        count = 0;
        for (var index = 0; index < counters.length; ++index) {
            count += counters[index];
        }
        return count;
    }

This is My Fav Method by Neil Lunn. But I need to implement to join with govenor info:

db.collection.find().map(doc => ({ 
  _id: { 
    country_name: doc.country_name,
    number_of_regions: doc.regions.length
  }, 
  regions: doc.regions.map( r => ({ 
    region_name: r.region_name,
    cities_in_region: doc.cities.filter( c => 
      c.region_id === r.region_id ).length })
   )

}))

Upvotes: 1

Views: 129

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151170

In order to do this, you basically want to follow the pattern of the basic "reshaping" statement I gave you first with a $project stage. Then you do the $unwind on the "regions" array before you $lookup, since you need to do that as $lookup cannot put output "inside" an array:

db.getCollection('regionInfo').aggregate([
  { "$project": {
    "_id": {
      "country_name": "$country_name",
      "number_of_region": { "$size": "$regions" }
    },
    "regions": {
      "$map": {
        "input": "$regions",
        "as": "r",
        "in": {
          "region_name": "$$r.region_name",
          "govenor_fname": "$$r.govenor_id",
          "cities_in_region": {
            "$size": {
              "$filter": {
                "input": "$cities",
                "as": "c",
                "cond": { "$eq": [ "$$c.region_id", "$$r.region_id" ] }
              }   
            }
          }
        }
      } 
    }  
  }},
  { "$unwind": "$regions" },
  { "$lookup": {
    "from": "govenor",
    "localField": "regions.govenor_fname",
    "foreignField": "_id",
    "as": "regions.govenor_fname"
  }},
  { "$addFields": {
    "regions.govenor_fname": {
        "$arrayElemAt": [
          "$regions.govenor_fname.name.fname",
          0
        ]
      }
  }},
  { "$group": {
    "_id": "$_id",
    "regions": { "$push": "$regions" }    
  }}
])

For the first part it all takes place just within the $project as we are just changing the shape of the document, much as was demonstrated in the JavaScript example to simply reshape things.

The "counts" you want are actually achieved by getting the $size from the various arrays in the document. The first is a simple "size" of the "regions" array contained.

The second "count" of "cities" from within each "region" is done by using $filter on the "cities" array in order to match on the current "region". The filtered list is then also simply tested for "size".

Since "regions" is an "array" and we want to join to another collection, we need to "de-normalize" it first. The $unwind does this operation by creating a copy of the document for each array member.

The $lookup can then reference the supplied "govenor_id" which we actually renamed to the target field in the earlier stage. We renamed it there so the order in the result document does not change around.

Since the result of $lookup is also an array ( since it can possibly match many ), we know it's a 1:1 relation here so instead of doing anything else we simply get the first array index from the result with $arrayElemAt.

That happens withing the $addFields pipeline stage that allows us to "only specify 'new' fields instead of all of them. This is added in MongoDB 3.4, but in an earlier version ( 3.2 since we use $lookup ) we would need to specify all fields explicitly using $project.

All that remains now is to bring the document(s) back together, in the state before the $unwind, so we use the $group pipeline stage in order to "group" by the document _id and $push all the "regions" data back into an array form.

The output is then:

{
    "_id" : {
        "country_name" : "UK",
        "number_of_region" : 2
    },
    "regions" : [ 
        {
            "region_name" : "west midlands",
            "govenor_fname" : "Joe",
            "cities_in_region" : 3
        }, 
        {
            "region_name" : "east midlands",
            "govenor_fname" : "Martin",
            "cities_in_region" : 2
        }
    ]
}

Also fixed your data since it had formatting errors in several places. The "govenor" collection stays the same:

regionInfo

{
    "country_id" : 328.0,
    "country_name" : "UK",
    "regions" : [ 
        {
            "region_id" : 1.0,
            "region_name" : "west midlands",
            "govenor_id" : "007"
        }, 
        {
            "region_id" : 2.0,
            "region_name" : "east midlands",
            "govenor_id" : "008"
        }
    ],
    "cities" : [ 
        {
            "city_id" : 1.0,
            "region_id" : 1.0,
            "city_name" : "Birmingham"
        }, 
        {
            "city_id" : 2.0,
            "region_id" : 1.0,
            "city_name" : "coventry"
        }, 
        {
            "city_id" : 3.0,
            "region_id" : 1.0,
            "city_name" : "warsaw"
        }, 
        {
            "city_id" : 4.0,
            "region_id" : 2.0,
            "city_name" : "dudley"
        }, 
        {
            "city_id" : 5.0,
            "region_id" : 2.0,
            "city_name" : "blabla"
        }
    ]
}

Upvotes: 1

Related Questions