dmin
dmin

Reputation: 78

How to improve loop performance on geospatial data in python, pandas, geopandas

I have a dataframe (df) and a geojson (gdf). In the dataframe I have a three columns, region, region check and geometry - df['geometry] which has point coordinates like: POINT (37.98730 11.09990). The dataframe has 40,000 rows.

I want to iterate through the dataframe and check to see if the coordinates are correctly assigned to the region. The output will check the coordinates against the geojson file and indicate in the new empty column- region_check- the correct column.

I have a loop for this, but it's too slow. I'm hoping someone can advise how to speed up this loop.

Many thanks

import pandas as pd
import numpy as np
import geopandas as gpd

df = pd.read_csv('gis_data_2020_check.csv')
gdf = gpd.read_file('eth_admin1.json')
df['region_check'] = ''

i = 0
count = 0
while i < len(df):
    if count < len(gdf):
        test = df['geometry'].iloc[i].within(gdf['geometry'].iloc[count])
        if test == True:
            df['region_check'].iloc[i] = gdf['ADM1_EN'].iloc[count]
            i += 1
            count = 0
        else:
            count +=1

Upvotes: 0

Views: 1380

Answers (2)

Mart&#237;n Maas
Mart&#237;n Maas

Reputation: 138

Performing a 'spatial join' with two geodataframes would speed up the process enormously, as you would get rid of the loop.

Step 1: Create a geodataframe for your points, with something like:

points = gpd.GeoDataFrame(df, geometry='geometry', crs=gdf.crs)

Step 2: Merge the two geodataframes with 'sjoin', that is, something like:

pointInPolys = gpd.tools.sjoin(points, polys, op="within", how='left')

Btw, I wrote an article about these kind of point-in-polygon tests, with more example code and performance in mind, www.matecdev.com/posts/point-in-polygon.html. I hope it is useful!

For additional performance, make sure to build Geopandas with the optional dependencies Rtrees and pyGEOS.

Upvotes: 2

megges
megges

Reputation: 640

With GeoPandas you can use R-Trees spatial indexing. You have to install libspatialindex to get it running:

conda install libspatialindex

Now you can query the indices of intersecting geometries:

spatial_index = gdf.sindex
possible_matches_index = list(spatial_index.intersection(polygon.bounds))

good explanation about this topic: https://geoffboeing.com/2016/10/r-tree-spatial-index-python/

Upvotes: 1

Related Questions