ABDULLOKH MUKHAMMADJONOV
ABDULLOKH MUKHAMMADJONOV

Reputation: 5234

How can I fix this geospatial data to be able to create 2dsphere index on mongodb?

I have a collection that contains geospatial data. I want create a 2dsphere index on that collection. Here is sample data (only related part is provided):

[
    {
        "entity_number": "Q1765236089-1",
        "location": {
            "type": "Polygon",
            "coordinates": [
                [
                    [
                        66.94697700059068,
                        39.37199847874891,
                        0
                    ],
                    [
                        66.94713546519371,
                        39.37185880644635,
                        0
                    ],
                    [
                        66.94738281423123,
                        39.37200984329424,
                        0
                    ],
                    [
                        66.94721290635363,
                        39.37215291517258,
                        0
                    ],
                    [
                        66.94721755688605,
                        39.37215367622946,
                        0
                    ],
                    [
                        66.94697700059068,
                        39.37199847874891,
                        0
                    ]
                ]
            ]
        },
    },
    {
        "entity_number": "J1765212045-9",
        "location": {
            "type": "Polygon",
            "coordinates": [
                [
                    [
                        66.64272448285637,
                        40.01845655540586,
                        0
                    ],
                    [
                        66.64280212857551,
                        40.01849759761946,
                        0
                    ],
                    [
                        66.64289492226342,
                        40.01841983943562,
                        0
                    ],
                    [
                        66.6428077685561,
                        40.01837167238775,
                        0
                    ],
                    [
                        66.64281005851667,
                        40.01837164697973,
                        0
                    ],
                    [
                        66.64272448285637,
                        40.01845655540586,
                        0
                    ]
                ]
            ]
        }
    }
]

I am trying to create the index using this query:

db.Coordinates.createIndex({location:"2dsphere"})

This gives me the following error (for the one with entity_number of "Q1765236089-1":

"Edges 2 and 4 cross. Edge locations in degrees: [39.3720098, 66.9473828]-[39.3721529, 66.9472129] and [39.3721537, 66.9472176]-[39.3719985, 66.9469770]"

As the error clearly says "Edges 2 and 4 cross". So I tried removing the entry at index 4. So, the coordinates became:

{
    "coordinates": 
            [
                [
                    [
                        66.94697700059068,
                        39.37199847874891,
                        0
                    ],
                    [
                        66.94713546519371,
                        39.37185880644635,
                        0
                    ],
                    [
                        66.94738281423123,
                        39.37200984329424,
                        0
                    ],
                    [
                        66.94721290635363,
                        39.37215291517258,
                        0
                    ],
                    [
                        66.94697700059068,
                        39.37199847874891,
                        0
                    ]
                ]
            ]
}

This fixed the error for the entity (with entity_number of "Q1765236089-1"). Then I got the same error for other entries and removing indicated index fixed the error each time.

I want to write a function that validates all my data by removing the redundant entries in the coordinates array. I need to know what calculations does mongodb make during indexing. Any help or suggestions are appreciated.

(P.S. I have some polygons with more than 4 sides. Some of them even have 50 sides)

Upvotes: 0

Views: 263

Answers (1)

user20042973
user20042973

Reputation: 5090

As @Charchit Kapoor mentioned in the comments, full validation is nontrivial. It doesn't seem likely that there is a way to automatically correct the GeoJSON data if there is an arbitrary set of geometrical problems with it.

It seems notable, however, that the problem was with the second-to-last coordinate in both of the examples given. How was this data generated? By chance, was the final coordinate entry appended to some original data (that was intended to represent a polygon originally) to fulfill the requirement that the "first and last coordinates must match in order to close the polygon"?

If so, then the problem may be relatively straightforward to correct. Effectively we want to remove the second-to-last entry (which I'm guessing originally represented the last coordinate). An example aggregation to perform such a change might look as follows:

[
  {
    $addFields: {
      "location.coordinates": [
        {
          $let: {
            vars: {
              arr: {
                "$arrayElemAt": [
                  "$location.coordinates",
                  0
                ]
              }
            },
            in: {
              "$concatArrays": [
                {
                  "$slice": [
                    "$$arr",
                    {
                      "$subtract": [
                        {
                          $size: "$$arr"
                        },
                        2
                      ]
                    }
                  ]
                },
                [
                  {
                    "$arrayElemAt": [
                      "$$arr",
                      0
                    ]
                  }
                ]
              ]
            }
          },
          
        }
      ]
    }
  }
]

It's a little verbose, but effectively it:

  1. Grabs the first nested array via $let
  2. Uses $concatArrays to produce an array where:
    • The first part of the array represents all of the entries in the existing array up until the last two
    • The final entry in the array is the existing last entry in the array (which matches the first in order to close the polygon).

You can see a demonstration of using this in an update in this playground link.

Inserting the two generated sample documents we can successfully create the 2dsphere index:

test> db.Coordinates.find()
[
  {
    _id: ObjectId("5a934e000102030405000000"),
    entity_number: 'Q1765236089-1',
    location: {
      coordinates: [
        [
          [ 66.94697700059068, 39.37199847874891, 0 ],
          [ 66.94713546519371, 39.37185880644635, 0 ],
          [ 66.94738281423123, 39.37200984329424, 0 ],
          [ 66.94721290635363, 39.37215291517258, 0 ],
          [ 66.94697700059068, 39.37199847874891, 0 ]
        ]
      ],
      type: 'Polygon'
    }
  },
  {
    _id: ObjectId("5a934e000102030405000001"),
    entity_number: 'J1765212045-9',
    location: {
      coordinates: [
        [
          [ 66.64272448285637, 40.01845655540586, 0 ],
          [ 66.64280212857551, 40.01849759761946, 0 ],
          [ 66.64289492226342, 40.01841983943562, 0 ],
          [ 66.6428077685561, 40.01837167238775, 0 ],
          [ 66.64272448285637, 40.01845655540586, 0 ]
        ]
      ],
      type: 'Polygon'
    }
  }
]
test> db.Coordinates.createIndex({location:"2dsphere"})
location_2dsphere

To reiterate, this approach should work regardless of how many sides the polygons have. However, there are a number of assumptions baked into it. The biggest one is the idea that it's always the second-to-last coordinate that is problematic as noted above. Another is that all of these documents represent polygons with a single ring (as opposed to polygons with multiple rings for example). If some assumptions don't hold true then the operation could be adjusted. Other assumptions, such as which entry is problematic, would not be possible to address programmatically while also ensuring that the polygon represents the intended shape as far as I'm aware.

Upvotes: 1

Related Questions