Harald Wiesinger
Harald Wiesinger

Reputation: 681

Aggregation $lookup for Geometry Values

i am trying to make a $lookup over 2 collections where the 1st collection has single points, in the 2nd collection are polygons... now i want to query all points within the polygon and build an avg value from the 1st collection.

my actual aggregation:

    'aggregation': {
        'pipeline': [
            # must match the field location from my first collection
            { "$match": {
                 "location" : "$loc" 
            } },

            # lookup to join location from 1st collection and geometry from 2nd collection
            { "$lookup": { 
                 "from": "polygons", 
                 "localField": "location", 
                 "foreignField": "geometry", 
                 "as": "join" } },

            # output grouped
            { "$group": { 
                "_id": "Average", 
                "Geometry": "$join.geometry" ,
                "AvgVal": { "$avg": "$myVal" }  , 
            "count": {"$sum": 1} } },
        ]
    }

i hope anybody is able to understand what i mean ;-)

edit: document 1:

{ id: "ABC", location: {type: "point", coordinates: [0,1]}, myVal: 1 },
{ id: "DEF", location: {type: "point", coordinates: [2,3]}, myVal: 2 }
{ id: "GHI", location: {type: "point", coordinates: [9,8]}, myVal: 3 }
{ id: "JKL", location: {type: "point", coordinates: [7,6]}, myVal: 4 }
{ id: "MNO", location: {type: "point", coordinates: [5,4]}, myVal: 5 }

document 2:

{ id: "Vienna Part1", geometry: {type: "polygon", coordinates: [[[0,1],[1,2],[2,3],[0,1]]] } },
{ id: "Vienna Part2", geometry: {type: "polygon", coordinates: [[[9,8],[7,6],[5,4],[9,8]]] } },

now i send a query with a polygon(map on my screen): $geoIntersect polygon.. [0,1]...[9,8]

expected:

{ id: "Vienna Part1", AvgVal: 1.5, geometry: {type: "polygon", coordinates: [[[0,1],[1,2],[2,3],[0,1]]] }
{ id: "Vienna Part2", AvgVal: 4, geometry: {type: "polygon", coordinates: [[[9,8],[7,6],[5,4],[9,8]]] }

lg Harald

Upvotes: 0

Views: 1074

Answers (1)

Blakes Seven
Blakes Seven

Reputation: 50436

You don't seem to be understanding that the concept of $lookup is that for a field to "match" on lookup it must have the same data. Since the data contained in one document is "double nested" in an array in "Polygon" notation, it is necessary to $unwind the array content "twice" in order to get the "match".

Setting this up as a whole example, let's create one collection with your "point" documents:

db.geo1.insertMany([
  { _id: "ABC", location: {type: "point", coordinates: [0,1]}, myVal: 1 },
  { _id: "DEF", location: {type: "point", coordinates: [2,3]}, myVal: 2 },
  { _id: "GHI", location: {type: "point", coordinates: [9,8]}, myVal: 3 },
  { _id: "JKL", location: {type: "point", coordinates: [7,6]}, myVal: 4 },
  { _id: "MNO", location: {type: "point", coordinates: [5,4]}, myVal: 5 }
])

And another including the "polygon" documents, deliberately including one that will not match:

db.geo2.insertMany([
  { _id: "Vienna Part1", geometry: {type: "polygon", coordinates: [[[0,1],[1,2],[2,3],[0,1]]] } },
  { _id: "Vienna Part2", geometry: {type: "polygon", coordinates: [[[9,8],[7,6],[5,4],[9,8]]] } },
  { _id: "Vienna Part3", geometry: {type: "polygon", coordinates: [[[10,1],[10,3],[3,10],[10,1]]] } }
])

In order to inspect the geo2 collection for items in geo1 that "intersect" via the same point coordinates then you must do:

db.geo2.aggregate([
  { "$unwind": "$geometry.coordinates" },
  { "$unwind": "$geometry.coordinates" },
  { "$lookup": {
    "from": "geo1",
    "localField": "geometry.coordinates",
    "foreignField": "location.coordinates",
    "as": "geo1"
  }},
  { "$group": {
    "_id": "$_id",
    "coordinates": {
      "$push": "$geometry.coordinates"
    },
    "matches": {
      "$push": { "$ne": [ "$geo1", [] ] }
    }
  }},
  { "$redact": {
    "$cond": {
      "if": { "$anyElementTrue": "$matches" },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$project" : {
    "geometry": {
      "type": { "$literal": "polygon" },
      "coordinates": ["$coordinates"]
    }
  }},
  { "$sort": { "_id": 1 } }
])

So the first thing is the double $unwind to just get the "point" data out of the arrays. Then you can do the $lookup.

{ "_id" : "Vienna Part1", "geometry" : { "type" : "polygon", "coordinates" : [ 0, 1 ] }, "geo1" : [ { "_id" : "ABC", "location" : { "type" : "point", "coordinates" : [ 0, 1 ] }, "myVal" : 1 } ] }
{ "_id" : "Vienna Part1", "geometry" : { "type" : "polygon", "coordinates" : [ 1, 2 ] }, "geo1" : [ ] }
{ "_id" : "Vienna Part1", "geometry" : { "type" : "polygon", "coordinates" : [ 2, 3 ] }, "geo1" : [ { "_id" : "DEF", "location" : { "type" : "point", "coordinates" : [ 2, 3 ] }, "myVal" : 2 } ] }
{ "_id" : "Vienna Part1", "geometry" : { "type" : "polygon", "coordinates" : [ 0, 1 ] }, "geo1" : [ { "_id" : "ABC", "location" : { "type" : "point", "coordinates" : [ 0, 1 ] }, "myVal" : 1 } ] }
{ "_id" : "Vienna Part2", "geometry" : { "type" : "polygon", "coordinates" : [ 9, 8 ] }, "geo1" : [ { "_id" : "GHI", "location" : { "type" : "point", "coordinates" : [ 9, 8 ] }, "myVal" : 3 } ] }
{ "_id" : "Vienna Part2", "geometry" : { "type" : "polygon", "coordinates" : [ 7, 6 ] }, "geo1" : [ { "_id" : "JKL", "location" : { "type" : "point", "coordinates" : [ 7, 6 ] }, "myVal" : 4 } ] }
{ "_id" : "Vienna Part2", "geometry" : { "type" : "polygon", "coordinates" : [ 5, 4 ] }, "geo1" : [ { "_id" : "MNO", "location" : { "type" : "point", "coordinates" : [ 5, 4 ] }, "myVal" : 5 } ] }
{ "_id" : "Vienna Part2", "geometry" : { "type" : "polygon", "coordinates" : [ 9, 8 ] }, "geo1" : [ { "_id" : "GHI", "location" : { "type" : "point", "coordinates" : [ 9, 8 ] }, "myVal" : 3 } ] }
{ "_id" : "Vienna Part3", "geometry" : { "type" : "polygon", "coordinates" : [ 10, 1 ] }, "geo1" : [ ] }
{ "_id" : "Vienna Part3", "geometry" : { "type" : "polygon", "coordinates" : [ 10, 3 ] }, "geo1" : [ ] }
{ "_id" : "Vienna Part3", "geometry" : { "type" : "polygon", "coordinates" : [ 3, 10 ] }, "geo1" : [ ] }
{ "_id" : "Vienna Part3", "geometry" : { "type" : "polygon", "coordinates" : [ 10, 1 ] }, "geo1" : [ ] }

The resulting data is showing the "matched" element(s) from the geo1 collection within the new array element "geo1" in the documents. This is a kind of "left join", so there is either an array with any match(es) or an empty array.

Your next point is we want to $group back to something more like the original document form, and here we test to see if the result of $lookup produced an empty array or not. This means that the corresponding document was "not found" in the other collection:

{ 
    "_id" : "Vienna Part3",
    "coordinates" : [ [ 10, 1 ], [ 10, 3 ], [ 3, 10 ], [ 10, 1 ] ],
    "matches" : [ false, false, false, false ]
}
{ 
    "_id" : "Vienna Part2",
    "coordinates" : [ [ 9, 8 ], [ 7, 6 ], [ 5, 4 ], [ 9, 8 ] ],
    "matches" : [ true, true, true, true ]
}
{ 
    "_id" : "Vienna Part1", 
    "coordinates" : [ [ 0, 1 ], [ 1, 2 ], [ 2, 3 ], [ 0, 1 ] ],
    "matches" : [ true, false, true, true ]
}

What you should see here is that one of the documents has a resulting "matches" array where every element is false. This is important for the following $redact condition, which does a test on that array with $anyElementTrue.

So where the array is seen in the document where all elements are false, this means that there was "no intersection" for this particular set of points. As such, that document will be discarded via $$PRUNE where the other documents containing "at least one" true would be kept.

The only thing remaining is a little cosmetic tranformation to give you back the desired result:

{ 
    "_id" : "Vienna Part1", 
    "geometry": {
        "type": "polygon",
        "coordinates" : [[ [ 0, 1 ], [ 1, 2 ], [ 2, 3 ], [ 0, 1 ] ]]
    }
}
{ 
    "_id" : "Vienna Part2",
    "geometry": {
        "type": "polygon",
        "coordinates" : [[ [ 9, 8 ], [ 7, 6 ], [ 5, 4 ], [ 9, 8 ] ]]
    }
}

Of course, different geometry types complicates the process a little more, but that is still the basic process.

  1. In order to $lookup you must transform the data into a form that would match the target collection field being inspected.

  2. When looking at results in a "set", you can test each element for a logical result and then evaluate with $anyElementTrue. This will tell you if something matched without destroying the individual elements.

So as well as the first condition, the second condition makes sure that your "Vienna Part1" document which contains points that do not match "all" of the documents in the other collection is also a match because "at least one" element had a true evaluation for a match.

Those are the rules. So I hope you understand it better now.

Upvotes: 2

Related Questions