Guimeteo
Guimeteo

Reputation: 41

Wrong points buffer using geopandas

Good evening, I'm working on a product to detect local events (strike) within subscription areas.

The yellow polygons should be 40KM (left) and 50KM (right) circles around central red points. Green points are my strikes that should be detected in my process.

enter image description here

It appears that my current use of buffer() does not produce 40/50 Km buffer radius as expected and then my process in missing my two events .

My code:

# Create my two events to detect
df_strike = pd.DataFrame(
    { 'Latitude': [27.0779, 31.9974],
     'Longitude': [51.5144, 38.7078]})
gdf_events = gpd.GeoDataFrame(df_strike, geometry=gpd.points_from_xy(df_strike.Longitude, df_strike.Latitude),crs = {'init':'epsg:4326'})

# Get location to create buffer
SUB_LOCATION = pd.DataFrame(
        { 'perimeter_id': [1370, 13858],
            'distance'  : [40.0, 50.0],
            'custom_lat': [31.6661, 26.6500],
            'custom_lon': [38.6635, 51.5700]})

gdf_locations  = gpd.GeoDataFrame(SUB_LOCATION, geometry=gpd.points_from_xy(SUB_LOCATION.custom_lon, SUB_LOCATION.custom_lat), crs = {'init':'epsg:4326'})

# Now reproject to a crs using meters
gdf_locations = gdf_locations.to_crs({'init':'epsg:3857'})
gdf_events = gdf_events.to_crs({'init':'epsg:3857'})

# Create buffer using distance (in meters) from locations 
gdf_locations['geometry'] = gdf_locations['geometry'].buffer(gdf_locations['distance']*1000)

# Matching events within buffer
matching_entln = pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))

But my result is an empty dataframe and should not be. If I compute distance between events and locations (distance between red and green points):

pnt1 = Point(27.0779, 51.5144)
pnt2 = Point(26.65, 51.57)
points_df = gpd.GeoDataFrame({'geometry': [pnt1, pnt2]}, crs='EPSG:4326')
points_df = points_df.to_crs('EPSG:3857')
points_df2 = points_df.shift() #We shift the dataframe by 1 to align pnt1 with pnt2
points_df.distance(points_df2)

Returns: 48662.078723 meters

and

pnt1 = Point(31.9974, 38.7078)
pnt2 = Point(31.6661, 38.6635)
points_df = gpd.GeoDataFrame({'geometry': [pnt1, pnt2]}, crs='EPSG:4326')
points_df = points_df.to_crs('EPSG:3857')
points_df2 = points_df.shift() #We shift the dataframe by 1 to align pnt1 with pnt2
points_df.distance(points_df2)

Returns: 37417.343796 meters

Then I was expecting to have this result :

>>> pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
   subscriber_id  perimeter_id  distance  custom_lat  custom_lon                                           geometry  index_right  Latitude  Longitude
0          19664          1370      40.0     31.6661     38.6635  POLYGON ((2230301.324 3642618.584, 2230089.452...            1   31.9974    38.7078
1          91201         13858      50.0     26.6500     51.5700  POLYGON ((3684499.890 3347425.378, 3684235.050...            0   27.0779    51.5144

I think my buffer is at ~47KM and ~38KM instead of 50KM and 40KM as expected. Am I missing something here which could explain that empty result ?

Upvotes: 1

Views: 1014

Answers (1)

swatchai
swatchai

Reputation: 18782

Certain computations with geodataframe's methods that involves distances, namely, .distance(), .buffer() in this particular case, are based on Euclidean geometry and map projection coordinate systems. Their results are not reliable, to always get the correct results one should avoid using them and use direct computation with geographic coordinates instead. Doing so with proper module/library, you will get great-circle arc distances instead of errorneous euclidean distances. Thus avoid mysterious (projection) errors.

Here I present the runnable code that show how to proceed along the line that I proposed:

import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon

import cartopy.crs as ccrs
import cartopy
import matplotlib.pyplot as plt

import numpy as np
from pyproj import Geod

# Create my two events to detect
df_strike = pd.DataFrame(
    { 'Latitude': [27.0779, 31.9974],
     'Longitude': [51.5144, 38.7078]})
gdf_events = gpd.GeoDataFrame(df_strike, geometry=gpd.points_from_xy(df_strike.Longitude, df_strike.Latitude),crs = {'init':'epsg:4326'})

# Get location to create buffer
SUB_LOCATION = pd.DataFrame(
        { 'perimeter_id': [1370, 13858],
            'distance'  : [40.0, 50.0],
            'custom_lat': [31.6661, 26.6500],
            'custom_lon': [38.6635, 51.5700]})

gdf_locations  = gpd.GeoDataFrame(SUB_LOCATION, geometry=gpd.points_from_xy(SUB_LOCATION.custom_lon, SUB_LOCATION.custom_lat), crs = {'init':'epsg:4326'})


# Begin: My code----------------
def point_buffer(lon, lat, radius_m):
    # Use this instead of `.buffer()` provided by geodataframe
    # Adapted from:
    # https://stackoverflow.com/questions/31492220/how-to-plot-a-tissot-with-cartopy-and-matplotlib
    geod = Geod(ellps='WGS84')
    num_vtxs = 64
    lons, lats, _ = geod.fwd(np.repeat(lon, num_vtxs),
                                     np.repeat(lat, num_vtxs),
                                     np.linspace(360, 0, num_vtxs),
                                     np.repeat(radius_m, num_vtxs),
                                     radians=False
                            )
    return Polygon(zip(lons, lats))

# Get location to create buffer
# Create buffer geometries from points' coordinates and distances using ...
#   special function `point_buffer()` defined above
gdf_locations['geometry'] = gdf_locations.apply(lambda row : point_buffer(row.custom_lon, row.custom_lat, 1000*row.distance), axis=1)

# Convert CRS to Mercator (epsg:3395), it will match `ccrs.Mercator()`
# Do not use Web_Mercator (epsg:3857), it is crude approx of 3395
gdf_locations = gdf_locations.to_crs({'init':'epsg:3395'})
gdf_events = gdf_events.to_crs({'init':'epsg:3395'})

# Matching events within buffer
matching_entln = pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))

# Visualization
# Use cartopy for best result
fig = plt.figure(figsize=(9,8))
ax = fig.add_subplot(projection=ccrs.Mercator())

gdf_locations.plot(color="green", ax=ax, alpha=0.4)
gdf_events.plot(color="red", ax=ax, alpha=0.9, zorder=23)

ax.coastlines(lw=0.3, color="gray")
ax.add_feature(cartopy.feature.LAND)
ax.add_feature(cartopy.feature.OCEAN)
ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True) 

# Other helpers
# Horiz/vert lines are plotted to mark the circles' centers
ax.hlines([31.6661,26.6500], 30, 60, transform=ccrs.PlateCarree(), lw=0.1)
ax.vlines([38.6635, 51.5700], 20, 35, transform=ccrs.PlateCarree(), lw=0.1)

ax.set_extent([35, 55, 25, 33], crs=ccrs.PlateCarree())

pnts-in-ccles

Spatial joining:

# Matching events within buffer
matching_entln = pd.DataFrame(gpd.sjoin(gdf_locations, gdf_events, how='inner'))
matching_entln[["perimeter_id", "distance", "index_right", "Latitude", "Longitude"]]  #custom_lat   custom_lon

join_result

Compute distances between points for checking

This checks the result of the spatial join if computed distances are less than the buffered distances.

# Use greatcircle arc length
geod = Geod(ellps='WGS84')

# centers of buffered-circles
from_lon1, from_lon2 = [38.6635, 51.5700]
from_lat1, from_lat2 = [31.6661, 26.6500]
# event locations
to_lon1, to_lon2= [51.5144, 38.7078]
to_lat1, to_lat2 = [27.0779, 31.9974]

_,_, dist_m = geod.inv(from_lon1, from_lat1, to_lon2, to_lat2, radians=False)
print(dist_m) #smaller than 40 km == inside
# Get: 36974.419811328786 m.

_,_, dist_m = geod.inv(from_lon2, from_lat2, to_lon1, to_lat1, radians=False)
print(dist_m) #smaller than 50 km == inside
# Get: 47732.76744655724 m.

My notes

  1. Serious geographic computation should be done directly with geodetic computation without the use of map projection of any kind.
  2. Map projection is used when you need graphic visualization. But correct geographic values that are computed/transformed to map projection CRS correctly are expected.
  3. Computation with map projection (grid) coordinate beyond its allowable limits (and get bad results) often done by inexperienced users.
  4. Computation involving map/grid position/values using euclidean geometry should be performed within small extent of projection areas that all kinds of map distortions is very low.

Upvotes: 2

Related Questions