Reputation: 78
I have a dataframe (df) and a geojson (gdf). In the dataframe I have a three columns, region, region check and geometry - df['geometry] which has point coordinates like: POINT (37.98730 11.09990). The dataframe has 40,000 rows.
I want to iterate through the dataframe and check to see if the coordinates are correctly assigned to the region. The output will check the coordinates against the geojson file and indicate in the new empty column- region_check- the correct column.
I have a loop for this, but it's too slow. I'm hoping someone can advise how to speed up this loop.
Many thanks
import pandas as pd
import numpy as np
import geopandas as gpd
df = pd.read_csv('gis_data_2020_check.csv')
gdf = gpd.read_file('eth_admin1.json')
df['region_check'] = ''
i = 0
count = 0
while i < len(df):
if count < len(gdf):
test = df['geometry'].iloc[i].within(gdf['geometry'].iloc[count])
if test == True:
df['region_check'].iloc[i] = gdf['ADM1_EN'].iloc[count]
i += 1
count = 0
else:
count +=1
Upvotes: 0
Views: 1380
Reputation: 138
Performing a 'spatial join' with two geodataframes would speed up the process enormously, as you would get rid of the loop.
Step 1: Create a geodataframe for your points, with something like:
points = gpd.GeoDataFrame(df, geometry='geometry', crs=gdf.crs)
Step 2: Merge the two geodataframes with 'sjoin', that is, something like:
pointInPolys = gpd.tools.sjoin(points, polys, op="within", how='left')
Btw, I wrote an article about these kind of point-in-polygon tests, with more example code and performance in mind, www.matecdev.com/posts/point-in-polygon.html. I hope it is useful!
For additional performance, make sure to build Geopandas with the optional dependencies Rtrees and pyGEOS.
Upvotes: 2
Reputation: 640
With GeoPandas you can use R-Trees spatial indexing.
You have to install libspatialindex
to get it running:
conda install libspatialindex
Now you can query the indices of intersecting geometries:
spatial_index = gdf.sindex
possible_matches_index = list(spatial_index.intersection(polygon.bounds))
good explanation about this topic: https://geoffboeing.com/2016/10/r-tree-spatial-index-python/
Upvotes: 1