Reputation: 425
I have two dataframes, one containing a column with points, and another one containing a polygon. The data looks like this:
>>> df1
Index Point
0 1 POINT (100 400)
1 2 POINT (920 400)
2 3 POINT (111 222)
>>> df2
Index Area-ID Polygon
0 1 New York POLYGON ((226000 619000, 226000 619500, 226500...
1 2 Amsterdam POLYGON ((226000 619000, 226000 619500, 226500...
2 3 Berlin POLYGON ((226000 619000, 226000 619500, 226500...
Reproducible example:
import pandas as pd
import shapely.wkt
data = {'Index': [1, 2, 3],
'Point': ['POINT (100 400)', 'POINT (920 400)', 'POINT (111 222)']}
df1 = pd.DataFrame(data)
df1['Point'] = df1['Point'].apply(shapely.wkt.loads)
data = {'Index': [1, 2, 3],
'Area-ID': ['New York', 'Amsterdam', 'Berlin'],
'Polygon': ['POLYGON ((90 390, 110 390, 110 410, 90 410, 90 390))',
'POLYGON ((890 390, 930 390, 930 410, 890 410, 890 390))',
'POLYGON ((110 220, 112 220, 112 225, 110 225, 110 220))']}
df2 = pd.DataFrame(data)
df2['Polygon'] = df2['Polygon'].apply(shapely.wkt.loads)
With shapely's function 'polygon.contains' I can check whether a polygon contains a certain point. The goal is to find the corresponding polygon for every point in dataframe 1.
The following approach works, but takes way too long considering the datasets are very large:
for index, row in dataframe1.iterrows():
print(index)
for index, row2 in dataframe2.iterrows():
if row2['Polygon'].contains(row[Point']):
dataframe1.iloc[index]['Area-ID'] = row2['Area-ID']
Is there a more time-efficient way to achieve this goal?
Upvotes: 1
Views: 154
Reputation: 7912
If every point is contained by exactly one polygon (as it does in the current form of the question), you can do:
df1=\
df1.assign(cities=df1.Point.apply(lambda point:
df2['Area-ID'].loc[
[i for i, polygon in enumerate(df2.Polygon)
if polygon.contains(point)][0]
]))
You'll get:
Index Point cities
0 1 POINT (100 400) New York
1 2 POINT (920 400) Amsterdam
2 3 POINT (111 222) Berlin
Upvotes: 1