Alexis Eggermont
Alexis Eggermont

Reputation: 8155

Merging geodataframes in geopandas (CRS do not match)

I am trying to merge two geodataframes (want to see which polygon each point is in).

The following code gets me a warning first ("CRS does not match!") and then an error ("RTreeError: Coordinates must not have minimums more than maximums").

What exactly is wrong in there? Are CRS coordinates systems? If so, why are they not loaded the same way?

import geopandas as gpd
from shapely.geometry import Point, mapping,shape
from geopandas import GeoDataFrame, read_file
#from geopandas.tools import overlay
from geopandas.tools import sjoin

print('Reading points...')
points=pd.read_csv(points_csv)
points['geometry'] = points.apply(lambda z: Point(z.Latitude, z.Longitude), axis=1)
PointsGeodataframe = gpd.GeoDataFrame(points)
print PointsGeodataframe.head()
print('Reading polygons...')
PolygonsGeodataframe = gpd.GeoDataFrame.from_file(china_shapefile+".shp")
print PolygonsGeodataframe.head()
print('Merging GeoDataframes...')
merged=sjoin(PointsGeodataframe, PolygonsGeodataframe, how='left', op='intersects')

#merged = PointsGeodataframe.merge(PolygonsGeodataframe, left_on='iso_alpha2', right_on='ISO2', how='left')
print(merged.head(5))

Link to data for reproduction: Shapefile, GPS points

Upvotes: 3

Views: 9218

Answers (2)

sawyer
sawyer

Reputation: 71

I'll add an answer here since I was recently struggling with this and couldn't find a great answer here. The Geopandas documentation has a good example for how to solve the "CRS does not match" issue.

I copied the entire code chunk from the documentation below, but the most relevant line is this one, where the to_crs() method is used to reproject a geodataframe. You can call mygeodataframe.crs to find the CRS of each dataframe, and then to_crs() to reproject one to match the other, like so:

world = world.to_crs({'init': 'epsg:3395'})

Simply setting PointsGeodataframe.crs = PolygonsGeodataframe.crs will stop the error from being thrown, but will not correctly reproject the geometry.

Full documentation code for reference:

# load example data
In [1]: world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

# Check original projection
# (it's Platte Carre! x-y are long and lat)
In [2]: world.crs
Out[2]: {'init': 'epsg:4326'}

# Visualize
In [3]: ax = world.plot()

In [4]: ax.set_title("WGS84 (lat/lon)");

# Reproject to Mercator (after dropping Antartica)
In [5]: world = world[(world.name != "Antarctica") & (world.name != "Fr. S. Antarctic Lands")]

In [6]: world = world.to_crs({'init': 'epsg:3395'}) # world.to_crs(epsg=3395) would also work

In [7]: ax = world.plot()

In [8]: ax.set_title("Mercator");

Upvotes: 1

Gord Stephen
Gord Stephen

Reputation: 1165

As noted in the comments on the question, you can eliminate the CRS does not match! warning by manually setting PointsGeodataframe.crs = PolygonsGeodataframe.crs (assuming the CRSs are indeed the same for both datasets).

However, that doesn't address the RTreeError. It's possible that you have missing lat/lon data in points_csv - in that case you would end up creating Point objects containing NaN values (i.e. Point(nan nan)), which go on to cause issues in rtree. I had a similar problem and the fix was just to filter out rows with missing coordinate data when loading in the CSV:

points=pd.read_csv(points_csv).dropna(subset=["Latitude", "Longitude"])

Upvotes: 1

Related Questions