Yury Wallet
Yury Wallet

Reputation: 1660

get zone_id for points inside the box-area (of if box contains point then get box_id)

Please, help me to speed up my code
There is a point with two coordinates (dataframe df1). Rows in df2 set box-areas with coordinates of the left bottom point and the top right point and every box has an zone_id. For every row (==point with 2 coordinates) from df1 i want to get zone_id from dataframe df2. My code is:

def zone_map(df1, df2):
    df2['zone_id'] = df2.index

    for t ,t2 in df2.iterrows():
        mask=(df1['lat'] >=df2.loc[t,'lat_bl']) 
              & (df1['lat'] <df2.loc[t,'lat_tr']) 
              & (df1['lon'] >=df2.loc[t,'lon_bl'])
              & (df1['lon'] <df2.loc[t,'lon_tr'])
        for col in ['zone_id', 'lat_bl', 'lon_bl', 'lat_tr', 'lon_tr']:
             df1.loc[mask, col] = df2.loc[t,col]

    return df1

df_nodes=zone_map(df, df_zones)

Data looks like

df_zones=pd.DataFrame()
df_zones['zone_id']=[0,1,2,3]
df_zones['lon_bl']=[0,0.1,0,0.1]
df_zones['lat_bl']=[0,0.1,0.1,0]
df_zones['lon_tr']=[0.1,0.2,0.1,0.2]
df_zones['lat_tr']=[0.1,0.2,0.2,0.1]

df=pd.DataFrame()
df['lon']=[0.3, 0.05, 0.11]
df['lat']=[0.3, 0.05, 0.05]

Thank you!

Upvotes: 1

Views: 59

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

Using GeoPandas:

import geopandas as gpd
from shapely.geometry import Point, Polygon

def box_to_poly(r):
    return Polygon([(r['lon_bl'], r['lat_bl']),
                    (r['lon_bl'], r['lat_tr']),
                    (r['lon_tr'], r['lat_tr']),
                    (r['lon_tr'], r['lat_bl'])])

z = gpd.GeoDataFrame(df_zones['zone_id'], geometry=df_zones.apply(box_to_poly, axis=1))
p = gpd.GeoDataFrame(geometry=df[['lon','lat']].apply(Point, axis=1))

gives us the following GeoPandas DFs:

In [119]: z
Out[119]:
            zone_id                                           geometry
index_left
0                 0        POLYGON ((0 0, 0 0.1, 0.1 0.1, 0.1 0, 0 0))
1                 1  POLYGON ((0.1 0.1, 0.1 0.2, 0.2 0.2, 0.2 0.1, ...
2                 2  POLYGON ((0 0.1, 0 0.2, 0.1 0.2, 0.1 0.1, 0 0.1))
3                 3  POLYGON ((0.1 0, 0.1 0.1, 0.2 0.1, 0.2 0, 0.1 0))

In [120]: p
Out[120]:
            geometry
0    POINT (0.3 0.3)
1  POINT (0.05 0.05)
2  POINT (0.11 0.05)

now we can use spatial join:

In [121]: gpd.sjoin(p, z, how='left')
Out[121]:
            geometry  index_right  zone_id
0    POINT (0.3 0.3)          NaN      NaN
1  POINT (0.05 0.05)          0.0      0.0
2  POINT (0.11 0.05)          3.0      3.0

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

An inner loop:

for col in ['zone_id', 'lat_bl', 'lon_bl', 'lat_tr', 'lon_tr']:
    df1.loc[mask, col] = df2.loc[t,col]

can be replaced like this:

# put this line before the first loop 
cols = ['zone_id', 'lat_bl', 'lon_bl', 'lat_tr', 'lon_tr']

# ...

df1.loc[mask, cols] = df2.loc[t, cols]

PS i would consider using GeoPandas for such tasks...

Upvotes: 1

Related Questions