Gonzalo Santiago
Gonzalo Santiago

Reputation: 348

How to ggregate two collections and match field with array

I need to group the results of two collections candidatos and ofertas, and then "merge" those groups to return an array with matched values.

I've created this example with the aggregate and similar data to make this easier to test:

https://mongoplayground.net/p/m0PUfdjEye4

This is the explanation of the problem that I'm facing.

I can get both groups with the desired results independently:

candidatos collection:

db.getCollection('ofertas').aggregate([
{"$group" : {_id:"$ubicacion_puesto.provincia", countProvinciaOferta:{$sum:1}}} 
 ]);

This is the result...

enter image description here

ofertas collection:

db.getCollection('candidatos').aggregate([
{"$group" : {_id:"$que_busco.ubicacion_puesto_trabajo.provincia", countProvinciaCandidato:{$sum:1}}} 
 ]);

This is the result...

enter image description here

What I need to do, is to aggregate those groups to merge their results based on their _id coincidence. I think I'm going in the right way with the next aggregate, but the field countOfertas always returns 0.0. I think that there is something wrong in my project $cond, but I don't know what is it. This is the aggregate:

db.getCollection('candidatos').aggregate([
    {"$group" : {_id:"$que_busco.ubicacion_puesto_trabajo.provincia", countProvinciaCandidato:{$sum:1}}},
    
            {
            $lookup: {
                from: 'ofertas',
                let: {},
                pipeline: [
                    {"$group" : {_id:"$ubicacion_puesto.provincia", countProvinciaOferta:{$sum:1}}} 
                ],
                as: 'ofertas'
            }
        },
     
       {
    $project: {
        _id: 1,
        countProvinciaCandidato: 1,
      countOfertas: {
          $cond: {
            if: {
              $eq: ['$ofertas._id', "$_id"]
            },
            then: '$ofertas.countProvinciaOferta',
            else: 0,
          }
      }
    }
  },   
        { $sort: { "countProvinciaCandidato": -1}},
        { $limit: 20 }
 ]); 

And this is the result, but as you can see, field countOfertas is always 0

enter image description here

Any kind of help will be welcome

Upvotes: 0

Views: 66

Answers (2)

varman
varman

Reputation: 8894

What you have tried is so much appreciated. But in $project you need to use $reduce which helps to loop through the array and satisfy the condition

Here is the code

db.candidatos.aggregate([
  {
    "$group": {
      _id: "$que_busco.ubicacion_puesto_trabajo.provincia",
      countProvinciaCandidato: { $sum: 1 }
    }
  },
  {
    $lookup: {
      from: "ofertas",
      let: {},
      pipeline: [
        {
          "$group": {
            _id: "$ubicacion_puesto.provincia",
            countProvinciaOferta: { $sum: 1 }
          }
        }
      ],
      as: "ofertas"
    }
  },
  {
    $project: {
      _id: 1,
      countProvinciaCandidato: 1,
      countOfertas: {
        "$reduce": {
          "input": "$ofertas",
          initialValue: 0,
          "in": {
            $cond: [
              { $eq: [ "$$this._id", "$_id" ] },
              { $add: [ "$$value", 1 ] },
              "$$value"
            ]
          }
        }
      }
    }
  },
  { $sort: { "countProvinciaCandidato": -1 } },
  { $limit: 20 }
])

Working Mongo playground

Note : If you need to do with aggregations only, this is fine. But I personally feel this approach is not good. My suggestion is, you can concurrently call group aggregations in different service and do it with programmatically. Because $lookup is expensive, when you get massive data, this performance will be reduced

Upvotes: 1

Joe
Joe

Reputation: 28326

The $eq in the $cond is comparing an array to an ObjectId, so it never matches.

The $lookup stage results will be in the ofertas field as an array of documents, so '$ofertas._id' will be an array of all the _id values.

You will probably need to use $unwind, $reduce after the $lookup.

Upvotes: 1

Related Questions