user3474541
user3474541

Reputation: 167

Mongodb - aggregate function nested ob

We need to calculate the minimum bounding rectangle (MBR) on Geospatial data. In oracle, we have SDO_AGGR_MBR function, is there any similar function in MongoDB.

"coord" : {
    "type" : "Polygon",
    "coordinates" : [ 
        [ 
            [ 
                25.5377574375611, 
                42.8545750237221
            ], 
            [ 
                47.7803203666229, 
                42.8545750237221
            ], 
            [ 
                47.7803203661319, 
                52.0987759993153
            ], 
            [ 
                25.5377574370701, 
                52.0987759993153
            ], 
            [ 
                25.5377574375611, 
                42.8545750237221
            ]
        ]
    ]
}

We have the geometry data like above but the coordinates length may vary. So, we need to find the minx, miny, maxx and maxy from these data.

Upvotes: 2

Views: 329

Answers (2)

buræquete
buræquete

Reputation: 14698

I don't think there is a built-in function for it. But you can simply do the following;

db.collection.aggregate([
  {
    $unwind: "$coordinates"
  },
  {
    $unwind: "$coordinates"
  },
  {
    $group: {
      _id: null,
      minX: {
        $min: { $arrayElemAt: [ "$coordinates", 0 ] }
      },
      maxX: {
        $max: { $arrayElemAt: [ "$coordinates", 0 ] }
      },
      minY: {
        $min: { $arrayElemAt: [ "$coordinates", 1 ] }
      },
      maxY: {
        $max: { $arrayElemAt: [ "$coordinates", 1 ] }
      }
    }
  }
])

Where first unwraps the coordinates array with $unwind (twice for extra [ ] block), so that aggregate pipeline can iterate on it. Then we just use a $group with _id: null which is a special operation to evaluate min/max values for all elements of the array.

which will get you a response you request;

[
  {
    "_id": null,
    "maxX": 47.7803203666229,
    "maxY": 52.0987759993153,
    "minX": 25.5377574370701,
    "minY": 42.8545750237221
  }
]

check on mongoplayground

Upvotes: 1

Neil Lunn
Neil Lunn

Reputation: 151142

The most efficient way is to employ the $map and $reduce operators along with $let. This allows you to process each document by manipulating the array's inline and then use $min and $max operators to obtain the bounding values:

db.collection.aggregate([
  { "$replaceRoot": {  
    "newRoot": {
      "$let": {
        "vars": {
          "m": {
            "$map": {
              "input": {
                "$reduce": {
                  "input": "$coord.coordinates",
                  "initialValue": [],
                  "in": {
                    "$concatArrays": [ "$$value", "$$this"]
                  }
                }
              },
              "in": {
                "x": { "$arrayElemAt": [ "$$this", 0 ] },
                "y": { "$arrayElemAt": [ "$$this", 1 ] }
              }
            }
          }
        },
        "in": {
          "_id": "$_id",
          "coord": "$coord",
          "minX": { "$min": "$$m.x" },
          "minY": { "$min": "$$m.y" },
          "maxX": { "$max": "$$m.x" },
          "maxY": { "$max": "$$m.y" }
        }
      }
    }
  }}
])

And the output, would be like:

{
        "_id" : ObjectId("5d9330e95994eb7018f59218"),
        "coord" : {
                "type" : "Polygon",
                "coordinates" : [
                        [
                                [
                                        25.5377574375611,
                                        42.8545750237221
                                ],
                                [
                                        47.7803203666229,
                                        42.8545750237221
                                ],
                                [
                                        47.7803203661319,
                                        52.0987759993153
                                ],
                                [
                                        25.5377574370701,
                                        52.0987759993153
                                ],
                                [
                                        25.5377574375611,
                                        42.8545750237221
                                ]
                        ]
                ]
        },
        "minX" : 25.5377574370701,
        "minY" : 42.8545750237221,
        "maxX" : 47.7803203666229,
        "maxY" : 52.0987759993153
}

Note the usage of the $replaceRoot aggregation pipeline stage, as this will allow the nested expressions with $let to essentially provide global variables to the document to produce that can be utilized in any output property.

The $reduce here basically serves the function to flatten the array from the standard GeoJSON form into just an array of coordinate pairs, without the additional bounding array.

This then feeds input to the $map which employs $arrayElemAt in order to re-map each coordinate pair into an object with x and y keys. This makes things much more simple for the actual execution or output section of the $let.

Note: An alternate approach to using $arrayElemAt against each key within the $map might well be to use $zip and $arrayToObject:

   "in": {
        "$arrayToObject": { "$zip": { "inputs": [ ["x","y"], "$$this" ] } }
   }

It has the same principle in the overall output, but takes advantage of $zip producing "paired" arrays, which also happens to be valid input for $arrayToObject to produce the final object form.

In the final part, we now basically have an array of objects with the named keys x and y. MongoDB allows a convenient way to remap simply the values for those named keys with notation like "$$m.x" where the "$$m expression related to the named variable of the $let and is our array of objects, and the .x part of course simply means only the values of x. This is a basic shorthand for a $map statement in itself, which suits this special usage case.

These arrays of values for specific properties can now be applied to the $min and $max operators, and this is how you get you min and max coordinates for a bounding rectangle.


Note that inline operators for arrays should always be preferred to $unwind.

The $unwind aggregation pipeline stage was an old introductory way of dealing with array elements by essentially flattening them into separate documents.

Though necessary when you actually want to group on a value as a key that comes from within an array, most operations which don't actually need that ( like this one ) can be done with more modern approaches.

The usage of $unwind is actually a huge performance penalty, due to it's function essentially being to replicate the entire parent document content of the field containing the array into it's own new document. Particularly in large datasets this has a very negative effect on performance, due to much increases I/O and memory usage.

The main lesson being, unless it's necessary to the operation being performed ( and here it is not ) then you should not be using $unwind in an aggregation pipeline. It might look easier to understand, but your code is actually hurting the system it's running on by including it.


Alternate Client Approach

Note also that if you don't actually need these results for any further aggregation processing, then it's probably a lot cleaner to code in the client as each document is processed.

For example, here's a plain JavaScript version for the shell:

db.collection.find().map(({ _id, coord }) =>
  (({ coordinates }) =>
    ({
      _id,
      coord,
      ...(m => 
        ({
          minX: Math.min(...m.map(({ x }) => x)),
          minY: Math.min(...m.map(({ y }) => y)),
          maxX: Math.max(...m.map(({ x }) => x)),
          maxY: Math.max(...m.map(({ y }) => y))
        })
      )(
        ((c) => c.reduce((o,e) => [ ...o, ...e ],[]).map(([x,y]) => ({ x, y })) )(coordinates)
      )
  })
  )(coord)
)

That has exactly the same output and is not nearly as unwieldy as the BSON operator statements required for an aggregation pipeline.

Upvotes: 1

Related Questions