Reputation: 41
Good evening, I'm working on a product to detect local events (strike) within subscription areas.
The yellow polygons should be 40KM (left) and 50KM (right) circles around central red points. Green points are my strikes that should be detected in my process.
It appears that my current use of buffer() does not produce 40/50 Km buffer radius as expected and then my process in missing my two events .
My code:
# Create my two events to detect
df_strike = pd.DataFrame(
{ 'Latitude': [27.0779, 31.9974],
'Longitude': [51.5144, 38.7078]})
gdf_events = gpd.GeoDataFrame(df_strike, geometry=gpd.points_from_xy(df_strike.Longitude, df_strike.Latitude),crs = {'init':'epsg:4326'})
# Get location to create buffer
SUB_LOCATION = pd.DataFrame(
{ 'perimeter_id': [1370, 13858],
'distance' : [40.0, 50.0],
'custom_lat': [31.6661, 26.6500],
'custom_lon': [38.6635, 51.5700]})
gdf_locations = gpd.GeoDataFrame(SUB_LOCATION, geometry=gpd.points_from_xy(SUB_LOCATION.custom_lon, SUB_LOCATION.custom_lat), crs = {'init':'epsg:4326'})
# Now reproject to a crs using meters
gdf_locations = gdf_locations.to_crs({'init':'epsg:3857'})
gdf_events = gdf_events.to_crs({'init':'epsg:3857'})
# Create buffer using distance (in meters) from locations
gdf_locations['geometry'] = gdf_locations['geometry'].buffer(gdf_locations['distance']*1000)
# Matching events within buffer
matching_entln = pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
But my result is an empty dataframe and should not be. If I compute distance between events and locations (distance between red and green points):
pnt1 = Point(27.0779, 51.5144)
pnt2 = Point(26.65, 51.57)
points_df = gpd.GeoDataFrame({'geometry': [pnt1, pnt2]}, crs='EPSG:4326')
points_df = points_df.to_crs('EPSG:3857')
points_df2 = points_df.shift() #We shift the dataframe by 1 to align pnt1 with pnt2
points_df.distance(points_df2)
Returns: 48662.078723 meters
and
pnt1 = Point(31.9974, 38.7078)
pnt2 = Point(31.6661, 38.6635)
points_df = gpd.GeoDataFrame({'geometry': [pnt1, pnt2]}, crs='EPSG:4326')
points_df = points_df.to_crs('EPSG:3857')
points_df2 = points_df.shift() #We shift the dataframe by 1 to align pnt1 with pnt2
points_df.distance(points_df2)
Returns: 37417.343796 meters
Then I was expecting to have this result :
>>> pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
subscriber_id perimeter_id distance custom_lat custom_lon geometry index_right Latitude Longitude
0 19664 1370 40.0 31.6661 38.6635 POLYGON ((2230301.324 3642618.584, 2230089.452... 1 31.9974 38.7078
1 91201 13858 50.0 26.6500 51.5700 POLYGON ((3684499.890 3347425.378, 3684235.050... 0 27.0779 51.5144
I think my buffer is at ~47KM and ~38KM instead of 50KM and 40KM as expected. Am I missing something here which could explain that empty result ?
Upvotes: 1
Views: 1014
Reputation: 18782
Certain computations with geodataframe's methods that involves distances, namely, .distance()
, .buffer()
in this particular case, are based on Euclidean geometry and map projection coordinate systems. Their results are not reliable, to always get the correct results one should avoid using them and use direct computation with geographic coordinates instead. Doing so with proper module/library, you will get great-circle arc distances instead of errorneous euclidean distances. Thus avoid mysterious (projection) errors.
Here I present the runnable code that show how to proceed along the line that I proposed:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon
import cartopy.crs as ccrs
import cartopy
import matplotlib.pyplot as plt
import numpy as np
from pyproj import Geod
# Create my two events to detect
df_strike = pd.DataFrame(
{ 'Latitude': [27.0779, 31.9974],
'Longitude': [51.5144, 38.7078]})
gdf_events = gpd.GeoDataFrame(df_strike, geometry=gpd.points_from_xy(df_strike.Longitude, df_strike.Latitude),crs = {'init':'epsg:4326'})
# Get location to create buffer
SUB_LOCATION = pd.DataFrame(
{ 'perimeter_id': [1370, 13858],
'distance' : [40.0, 50.0],
'custom_lat': [31.6661, 26.6500],
'custom_lon': [38.6635, 51.5700]})
gdf_locations = gpd.GeoDataFrame(SUB_LOCATION, geometry=gpd.points_from_xy(SUB_LOCATION.custom_lon, SUB_LOCATION.custom_lat), crs = {'init':'epsg:4326'})
# Begin: My code----------------
def point_buffer(lon, lat, radius_m):
# Use this instead of `.buffer()` provided by geodataframe
# Adapted from:
# https://stackoverflow.com/questions/31492220/how-to-plot-a-tissot-with-cartopy-and-matplotlib
geod = Geod(ellps='WGS84')
num_vtxs = 64
lons, lats, _ = geod.fwd(np.repeat(lon, num_vtxs),
np.repeat(lat, num_vtxs),
np.linspace(360, 0, num_vtxs),
np.repeat(radius_m, num_vtxs),
radians=False
)
return Polygon(zip(lons, lats))
# Get location to create buffer
# Create buffer geometries from points' coordinates and distances using ...
# special function `point_buffer()` defined above
gdf_locations['geometry'] = gdf_locations.apply(lambda row : point_buffer(row.custom_lon, row.custom_lat, 1000*row.distance), axis=1)
# Convert CRS to Mercator (epsg:3395), it will match `ccrs.Mercator()`
# Do not use Web_Mercator (epsg:3857), it is crude approx of 3395
gdf_locations = gdf_locations.to_crs({'init':'epsg:3395'})
gdf_events = gdf_events.to_crs({'init':'epsg:3395'})
# Matching events within buffer
matching_entln = pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
# Visualization
# Use cartopy for best result
fig = plt.figure(figsize=(9,8))
ax = fig.add_subplot(projection=ccrs.Mercator())
gdf_locations.plot(color="green", ax=ax, alpha=0.4)
gdf_events.plot(color="red", ax=ax, alpha=0.9, zorder=23)
ax.coastlines(lw=0.3, color="gray")
ax.add_feature(cartopy.feature.LAND)
ax.add_feature(cartopy.feature.OCEAN)
ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True)
# Other helpers
# Horiz/vert lines are plotted to mark the circles' centers
ax.hlines([31.6661,26.6500], 30, 60, transform=ccrs.PlateCarree(), lw=0.1)
ax.vlines([38.6635, 51.5700], 20, 35, transform=ccrs.PlateCarree(), lw=0.1)
ax.set_extent([35, 55, 25, 33], crs=ccrs.PlateCarree())
Spatial joining:
# Matching events within buffer
matching_entln = pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
matching_entln[["perimeter_id", "distance", "index_right", "Latitude", "Longitude"]] #custom_lat custom_lon
Compute distances between points for checking
This checks the result of the spatial join if computed distances are less than the buffered distances.
# Use greatcircle arc length
geod = Geod(ellps='WGS84')
# centers of buffered-circles
from_lon1, from_lon2 = [38.6635, 51.5700]
from_lat1, from_lat2 = [31.6661, 26.6500]
# event locations
to_lon1, to_lon2= [51.5144, 38.7078]
to_lat1, to_lat2 = [27.0779, 31.9974]
_,_, dist_m = geod.inv(from_lon1, from_lat1, to_lon2, to_lat2, radians=False)
print(dist_m) #smaller than 40 km == inside
# Get: 36974.419811328786 m.
_,_, dist_m = geod.inv(from_lon2, from_lat2, to_lon1, to_lat1, radians=False)
print(dist_m) #smaller than 50 km == inside
# Get: 47732.76744655724 m.
My notes
Upvotes: 2