Using `$unwind` in MongoDB doubts of different `$sum` outputs

I´ve a doubt using $unwind...

First of all, this is my collection "Provincias.json" that stores info about Spanish regions (short example):

 {
  _id: ObjectId("62095151245a59e4bda99e9a"),
  Nombre: 'Soria',
  CA: 'Castilla y León',
  Superficie: 10306,
  Datos: [
   { Anyo: 2015, Valor: 91780 },
   { Anyo: 2014, Valor: 92630 },
   { Anyo: 2013, Valor: 93575 },
   { Anyo: 2012, Valor: 94463 },
   ..... ]

Where CA is the state name, Nombre the county name, Superficie county surface and Datos the population each year.

Well I´m trying to make a "query" that returns the info grouped by CA with total surface and the difference between max and min population densities.

My expression is as follows:

db.ProvinciasPEC1.aggregate([
   {$unwind: "$Datos"},
   {$group: {
       "_id": "$CA", 
       "totSuperficie": {$sum:"$Superficie"},
       "PoblacionMaxima": {$max: "$Datos.Valor"},
       "PoblacionMinima": {$min: "$Datos.Valor"}}}, 
        {$project: {
           "_id": 1,"totSuperficie": 1,
           "totPoblacion": 1,
           "diffDensidad": { 
               $divide: [ 
                 {$subtract: ["$PoblacionMaxima", "$PoblacionMinima"]} , "$totSuperficie"
               ] 
            }
         }
      },
      {$sort: {"diffDensidad":-1}},
      {$out:"CAM_Densidad"}
    ])

I returns me somethin like this:

enter image description here

But I don´t know why the total Surface is wrong.

Using the following expression I get States with proper total surface:

db.ProvinciasPEC1.aggregate([{$group: {"_id": "$CA", "totSuperficie": {$sum:"$Superficie"}}},{$out:"CAM_Densidad"}])

That returns:

enter image description here

Please, any hint to figure out what I´m doing wrong? I think is related to $unwind, but not sure,

Thanks a lot!

Upvotes: 0

Views: 66

Answers (2)

rickhg12hs
rickhg12hs

Reputation: 11932

Here's one way to do it. Determine the min/max Valor per document, and then $group by CA.

db.collection.aggregate([
  {
    "$set": {
      "maxV": {
        "$max": "$Datos.Valor"
      },
      "minV": {
        "$min": "$Datos.Valor"
      }
    }
  },
  {
    "$group": {
      "_id": "CA",
      "totMaxV": {
        "$max": "$maxV"
      },
      "totMinV": {
        "$min": "$minV"
      },
      "totSf": {
        "$sum": "$Superficie"
      }
    }
  },
  {
    "$project": {
      "diffDensidad": {
        "$divide": [
          {
            "$subtract": [
              "$totMaxV",
              "$totMinV"
            ]
          },
          "$totSf"
        ]
      },
      "CA": "$_id",
      "_id": 0
    }
  }
])

Try it on mongoplayground.net.

Upvotes: 1

R2D2
R2D2

Reputation: 10737

Maybe something like this:

 db.collection.aggregate([
  {
   $unwind: "$Datos"
  },
  {
   $group: {
   "_id": "$CA",
   "totSuperficie": {
    $first: "$Superficie"
   },
   "PoblacionMaxima": {
    $max: "$Datos.Valor"
   },
   "PoblacionMinima": {
    $min: "$Datos.Valor"
   }
   }
  },
  {
   $project: {
    "_id": 1,
    "totSuperficie": 1,
    "totPoblacion": 1,
    "diffDensidad": {
      $divide: [
      {
        $subtract: [
          "$PoblacionMaxima",
          "$PoblacionMinima"
        ]
       },
      "$totSuperficie"
        ]
      }
    }
  },
  {
    $sort: {
     "diffDensidad": -1
   }
  },
  {
    $out: "CAM_Densidad"
  }
])

explained:

  1. Unwind the Datos
  2. Group the per CA (but preserve the Superficie since it is the same value for all unwinded Datos so if you sum them you will get different sum depending on number of Datos per CA )
  3. Project
  4. Sort by diff
  5. Out to the output collection

playground

Upvotes: 0

Related Questions