Reputation: 8155
I am trying to merge two geodataframes (want to see which polygon each point is in).
The following code gets me a warning first ("CRS does not match!
")
and then an error ("RTreeError: Coordinates must not have minimums more than maximums
").
What exactly is wrong in there? Are CRS coordinates systems? If so, why are they not loaded the same way?
import geopandas as gpd
from shapely.geometry import Point, mapping,shape
from geopandas import GeoDataFrame, read_file
#from geopandas.tools import overlay
from geopandas.tools import sjoin
print('Reading points...')
points=pd.read_csv(points_csv)
points['geometry'] = points.apply(lambda z: Point(z.Latitude, z.Longitude), axis=1)
PointsGeodataframe = gpd.GeoDataFrame(points)
print PointsGeodataframe.head()
print('Reading polygons...')
PolygonsGeodataframe = gpd.GeoDataFrame.from_file(china_shapefile+".shp")
print PolygonsGeodataframe.head()
print('Merging GeoDataframes...')
merged=sjoin(PointsGeodataframe, PolygonsGeodataframe, how='left', op='intersects')
#merged = PointsGeodataframe.merge(PolygonsGeodataframe, left_on='iso_alpha2', right_on='ISO2', how='left')
print(merged.head(5))
Link to data for reproduction: Shapefile, GPS points
Upvotes: 3
Views: 9218
Reputation: 71
I'll add an answer here since I was recently struggling with this and couldn't find a great answer here. The Geopandas documentation has a good example for how to solve the "CRS does not match" issue.
I copied the entire code chunk from the documentation below, but the most relevant line is this one, where the to_crs()
method is used to reproject a geodataframe. You can call mygeodataframe.crs
to find the CRS of each dataframe, and then to_crs()
to reproject one to match the other, like so:
world = world.to_crs({'init': 'epsg:3395'})
Simply setting PointsGeodataframe.crs = PolygonsGeodataframe.crs
will stop the error from being thrown, but will not correctly reproject the geometry.
Full documentation code for reference:
# load example data
In [1]: world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Check original projection
# (it's Platte Carre! x-y are long and lat)
In [2]: world.crs
Out[2]: {'init': 'epsg:4326'}
# Visualize
In [3]: ax = world.plot()
In [4]: ax.set_title("WGS84 (lat/lon)");
# Reproject to Mercator (after dropping Antartica)
In [5]: world = world[(world.name != "Antarctica") & (world.name != "Fr. S. Antarctic Lands")]
In [6]: world = world.to_crs({'init': 'epsg:3395'}) # world.to_crs(epsg=3395) would also work
In [7]: ax = world.plot()
In [8]: ax.set_title("Mercator");
Upvotes: 1
Reputation: 1165
As noted in the comments on the question, you can eliminate the CRS does not match!
warning by manually setting PointsGeodataframe.crs = PolygonsGeodataframe.crs
(assuming the CRSs are indeed the same for both datasets).
However, that doesn't address the RTreeError
. It's possible that you have missing lat/lon data in points_csv
- in that case you would end up creating Point
objects containing NaN values (i.e. Point(nan nan)
), which go on to cause issues in rtree
. I had a similar problem and the fix was just to filter out rows with missing coordinate data when loading in the CSV:
points=pd.read_csv(points_csv).dropna(subset=["Latitude", "Longitude"])
Upvotes: 1