rkd
rkd

Reputation: 23

How to match and group array elements with the max value in aggregation

I need help to get the array element having maximum value of a field(level) from a document. Then count the total occurences grouped by array element field "bssid".

Consider the following data

/* 1 */
{
    "_id" : "18:59:36:0c:94:a3",
    "timestamp" : "1460012567",
    "apdata" : [{
        "bssid" : "f4:b7:e2:56:e4:20",
        "ssid" : "Test Network2",
        "level" : -55
    }, {
        "bssid" : "b8:a3:86:67:03:56",
        "ssid" : "Test Network1",
        "level" : -76
    }]
}
/* 2 */
{
    "_id" : "d0:b3:3f:b9:42:38",
    "timestamp" : "1460013345",
    "apdata" : [{
        "bssid" : "f4:b7:e2:56:e4:20",
        "ssid" : "Test Network2",
        "level" : -65
    }, {
        "bssid" : "b8:a3:86:67:03:56",
        "ssid" : "Test Network1",
        "level" : -46
    }]
}
/* 3 */
{
    "_id" : "d0:b3:3f:b9:42:41",
    "timestamp" : "1460013145",
    "apdata" : [{
        "bssid" : "f4:b7:e2:56:e4:20",
        "ssid" : "Test Network2",
        "level" : -65
    }, {
        "bssid" : "b8:a3:86:67:03:56",
        "ssid" : "Test Network1",
        "level" : -46
    }]
 }

The output required is

{
    "bssid" : "f4:b7:e2:56:e4:20",
    "ssid" : "Test Network2",
    "count" : 1
}, {
    "bssid" : "b8:a3:86:67:03:56",
    "ssid" : "Test Network1",
    "count" : 2
}

Which is the count of times each bssid had the maximum value within the array of each document over the whole collection.

Upvotes: 2

Views: 2506

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151170

If you have MongoDB 3.2 available then you can do something like this:

db.sample.aggregate([
  { "$project": {
    "apdata": {
      "$arrayElemAt": [
        { "$filter": {
          "input": "$apdata",
          "as": "el",
          "cond": {
            "$eq": [ 
              "$$el.level",
              { "$max": {
                "$map": {
                  "input": "$apdata",
                  "as": "data",
                  "in": "$$data.level"
                }
              }}
            ]
          }
        }},
        0
      ]
    }
  }},
  { "$group": {
    "_id": "$apdata.bssid",
    "ssid": { "$first": "$apdata.ssid" },
    "count": { "$sum": 1 }
  }}
])

For at least MongoDB 2.6 you need to do this:

db.sample.aggregate([
  { "$unwind": "$apdata" },
  { "$group": {
    "_id": "$_id",
    "apdata": { "$push": "$apdata" },
    "max": { "$max": "$apdata.level" }
  }},
  { "$unwind": "$apdata" },
  { "$redact": {
    "$cond": {
      "if": { "$eq": [ "$apdata.level", "$max" ] },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$group": {
    "_id": "$apdata.bssid",
    "ssid": { "$first": "$apdata.ssid" },
    "count": { "$sum": 1 }
  }}
])

And for MongoDB 2.4 or 2.2 like this:

db.sample.aggregate([
  { "$unwind": "$apdata" },
  { "$group": {
    "_id": "$_id",
    "apdata": { "$push": "$apdata" },
    "max": { "$max": "$apdata.level" }
  }},
  { "$unwind": "$apdata" },
  { "$project": {
    "apdata": 1,
    "isMax": { "$eq": [ "$apdata.level", "$max" ] }
  }},
  { "$match": { "isMax": true } },
  { "$group": {
    "_id": "$apdata.bssid",
    "ssid": { "$first": "$apdata.ssid" },
    "count": { "$sum": 1 }
  }}
])

In all cases $max is used to get the "maximum" value of of the array in each document "first", then you can use that to "filter" the array content prior to using it in a $group. The approaches to this only vary with version

MongoDB 3.2: Allows the $max to work directly on an "array" of values. So the $map is used to just get the "level" values and find out what that "max" actually is.

Then the $filter can be used to just return the array element which matches that "max" value, and finally $arrayElemAt is used to return that "only" ( out of two possible and "zero" index ) element as a plain document.

The whole process can be done in $group "only" if you basically repeat that whole statement for both the _id and in order to get the $first "ssid" value, but it's a bit easier to write in a $project separately to demonstrate.

MongoDB 2.6: This lacks the fancier operators and most notably the ability of $max to work "directly" on an array. The notable thing is the need to $unwind the array first and then actually $group just on the original document, solely in order to get that "max" value.

Then the process really needs you to $unwind again since you will be grouping on the element from the array later, and then use $redact to filter the content. This is a "logical" form of $match where you can directly compare the "level" against the computed "max" from the earlier stage. So the element that is not the "max" is removed.

MongoDB 2.4: Is again basically the same logic, except instead of $redact you actually need the physical $project in order to put a field in the document to use in filtering with $match.


All versions have the same final $group, where you supply the path to "apdata.bssid" for the grouping key and the $first result on that grouping boundary for the "ssid" and a simple $sum to count the occurrences of the grouping key in the results.

Everything returns just as follows:

{ "_id" : "f4:b7:e2:56:e4:20", "ssid" : "Test Network2", "count" : 1 }
{ "_id" : "b8:a3:86:67:03:56", "ssid" : "Test Network1", "count" : 2 }

Actually the most "efficient" form for MongoDB 3.2 would be as follows:

db.sample.aggregate([
  { "$group": {
    "_id": {
      "$arrayElemAt": [
        { "$map": {
          "input": {
            "$filter": {
              "input": "$apdata",
              "as": "el",
              "cond": {
                "$eq": [ 
                  "$$el.level",
                  { "$max": {
                    "$map": {
                      "input": "$apdata",
                      "as": "data",
                      "in": "$$data.level"
                    }
                  }}
                ]
              }
            }            
          },
          "as": "apdata",
          "in": {
            "bssid": "$$apdata.bssid",
            "ssid": "$$apdata.ssid"
          }
        }},
        0
      ]
    },
    "count": { "$sum": 1 }
  }}
])

With a slightly different form due to the compound _id, but it is a single $group stage only, without repetition of the whole process to find the array element data for the "max" value:

  {
    "_id" : {
      "bssid" : "b8:a3:86:67:03:56",
      "ssid" : "Test Network1"
    },
    "count" : 2
  }
  {
    "_id" : {
      "bssid" : "f4:b7:e2:56:e4:20",
      "ssid" : "Test Network2"
    },
    "count" : 1
  }

Upvotes: 1

Related Questions