MongoDb/GeoJson: MultiPolygon vs GeometryCollection containing only polygons

Question

I'm collecting location information from different sources and storing everything in a MongoDb collection. Apart from point locations with a single lat/lng coordinates, I'm also storing areas.

Now, one data gives me the location information as GeometryCollection but with all elements being Polygons. Another data source gives me the location as MultiPolygon. While I'm actually considering have a collection for each data source, I'm wondering which approach is better in the whole.

GeometryCollection is certainly more flexible, but maybe MultiPolygon shows better query performance (given that I always create a 2dspehere index over the location field). Is it worth it to convert one representation into the other?

Buzz Moschetti · Accepted Answer

Good news: query performance and indexability are the same in MongoDB for all supported GeoJSON types.

The main driver in your decision should be whether your info architecture for the geo field and the software that consumes it needs to contain more types than just polygons. You say you're storing point locations? If you want to hold all geo data in a single field e.g. location (and likely with a 2dsphere index on that) then you will need GeometryCollection into which you can put Pointand the MultiPolygon. It is recommended in the GeoJSON spec https://www.rfc-editor.org/rfc/rfc7946#page-9 not to nest GeometryCollection so for those data sources giving you a GeometryCollection, you would iterate the contents and populate your own GeometryCollection which also holds your Points etc.

If you are storing points separately, e.g. eventCenter as separate from eventAreasEffected, then the eventCenter can be just a Point and the eventAreasEffected can be a single 'MultiPolygon'; no need for GeometryCollection. It is perfectly fine to have geo in more than one field, and to have or not have multiple 2dsphere indexes on these fields. Starting in MongoDB 4.0, you can use $geoNear on a collection that has more than one 2dsphere index by including the key option.

Here's an unofficial but reasonable definitional approach: A MultiPolygon is not an arbitrary collection of Polygon but rather a single "shape concept" that happens to have disjoint polygons. The United States can be described in a single MultiPolygon that has Alaska, Hawaii, the continental US, maybe Puerto Rico, etc. In fact, to this end, you'll note that it is a little trickier to store data relevant to each member of the MultiPolygon because coordinates can only be an array of arrays of points. Information about the third polygon, for example, has to be carried in a peer field to the single toplevel coordinates field. But a discrete array of Polygon or a GeometryCollection of Polygon can store extra information in each shape. Note that neither GeoJSON nor MongoDB restrict you from adding fields in addition to type and coordinates for each shape.

A more subtle issue is the design and semantics of a GeometryCollection of Polygon vs. MultiPolygon. To further complicate it, there is the issue of explicit holes defined in the Polygon vs. a collection of implicitly "layered" Polygon that are post-processed outside of the DB by geo software.

MongoDb/GeoJson: MultiPolygon vs GeometryCollection containing only polygons

Answers (2)

Related Questions