Reputation: 1660
Please, help me to speed up my code
There is a point with two coordinates (dataframe df1). Rows in df2 set box-areas with coordinates of the left bottom point and the top right point and every box has an zone_id. For every row (==point with 2 coordinates) from df1 i want to get zone_id from dataframe df2. My code is:
def zone_map(df1, df2):
df2['zone_id'] = df2.index
for t ,t2 in df2.iterrows():
mask=(df1['lat'] >=df2.loc[t,'lat_bl'])
& (df1['lat'] <df2.loc[t,'lat_tr'])
& (df1['lon'] >=df2.loc[t,'lon_bl'])
& (df1['lon'] <df2.loc[t,'lon_tr'])
for col in ['zone_id', 'lat_bl', 'lon_bl', 'lat_tr', 'lon_tr']:
df1.loc[mask, col] = df2.loc[t,col]
return df1
df_nodes=zone_map(df, df_zones)
Data looks like
df_zones=pd.DataFrame()
df_zones['zone_id']=[0,1,2,3]
df_zones['lon_bl']=[0,0.1,0,0.1]
df_zones['lat_bl']=[0,0.1,0.1,0]
df_zones['lon_tr']=[0.1,0.2,0.1,0.2]
df_zones['lat_tr']=[0.1,0.2,0.2,0.1]
df=pd.DataFrame()
df['lon']=[0.3, 0.05, 0.11]
df['lat']=[0.3, 0.05, 0.05]
Thank you!
Upvotes: 1
Views: 59
Reputation: 210842
Using GeoPandas:
import geopandas as gpd
from shapely.geometry import Point, Polygon
def box_to_poly(r):
return Polygon([(r['lon_bl'], r['lat_bl']),
(r['lon_bl'], r['lat_tr']),
(r['lon_tr'], r['lat_tr']),
(r['lon_tr'], r['lat_bl'])])
z = gpd.GeoDataFrame(df_zones['zone_id'], geometry=df_zones.apply(box_to_poly, axis=1))
p = gpd.GeoDataFrame(geometry=df[['lon','lat']].apply(Point, axis=1))
gives us the following GeoPandas DFs:
In [119]: z
Out[119]:
zone_id geometry
index_left
0 0 POLYGON ((0 0, 0 0.1, 0.1 0.1, 0.1 0, 0 0))
1 1 POLYGON ((0.1 0.1, 0.1 0.2, 0.2 0.2, 0.2 0.1, ...
2 2 POLYGON ((0 0.1, 0 0.2, 0.1 0.2, 0.1 0.1, 0 0.1))
3 3 POLYGON ((0.1 0, 0.1 0.1, 0.2 0.1, 0.2 0, 0.1 0))
In [120]: p
Out[120]:
geometry
0 POINT (0.3 0.3)
1 POINT (0.05 0.05)
2 POINT (0.11 0.05)
now we can use spatial join:
In [121]: gpd.sjoin(p, z, how='left')
Out[121]:
geometry index_right zone_id
0 POINT (0.3 0.3) NaN NaN
1 POINT (0.05 0.05) 0.0 0.0
2 POINT (0.11 0.05) 3.0 3.0
Upvotes: 1
Reputation: 210842
An inner loop:
for col in ['zone_id', 'lat_bl', 'lon_bl', 'lat_tr', 'lon_tr']:
df1.loc[mask, col] = df2.loc[t,col]
can be replaced like this:
# put this line before the first loop
cols = ['zone_id', 'lat_bl', 'lon_bl', 'lat_tr', 'lon_tr']
# ...
df1.loc[mask, cols] = df2.loc[t, cols]
PS i would consider using GeoPandas for such tasks...
Upvotes: 1