Reputation: 395
I have a two data frames with lat-lon values and some additional information I want to merge two dataFrames in range of values
df1
lat1 lon1 lat2 lon2 a1 a2 a2
1.0 1.0 4.0 7.0 a d p
2.0 2.0 5.0 8.0 b e q
3.0 3.0 6.0 9.0 c f r
This is of size 30 million rows
df2
lat lon x
1.0 1.0 m
1.0 2.0 n
2.0 3.0 o
This is of size 20 million rows
Now for each row in df1 i want to merge with df2 having all the lat in df2 between lat1 and lat2 and lon in lon1 and lon2
lat1<=lat<=lat2
lon1<=lon<=lon2
I have tried
Best way to join / merge by range in pandas
how to perform an inner or outer join of DataFrames with Pandas on non-simplistic criterion
But still i run out of memory.
What is the most optimal way to do this?
Upvotes: 2
Views: 64
Reputation: 395
import geopandas as gp
from shapely.geometry import Polygon, Point
df2_geo = gp.GeoDataFrame(df2, geometry=gp.points_from_xy(df2.latitude, df2.longitude))
def make_polygon(a,b,c,d):
lat_list = [a,a,c,c]
lon_list = [b,d,d,b]
return Polygon(zip(lat_list, lon_list))
df1['geometry'] = df1.apply(lambda x: make_polygon(x['_from_latitude'], x['_from_longitude'], x['_to_latitude'],x['_to_longitude']),axis=1)
df1_gep = gp.GeoDataFrame(df1)
result = geopandas.sjoin(df2_geo, df1_gep, how="inner", op='intersects')
@SubbuVidyaSekar Thanks for suggestion
This works perfectly fine
Upvotes: 0