Bimal Gangawal
Bimal Gangawal

Reputation: 395

Merging huge dataFrame in range of values

I have a two data frames with lat-lon values and some additional information I want to merge two dataFrames in range of values

df1 

lat1 lon1 lat2 lon2 a1 a2 a2
1.0   1.0  4.0  7.0 a  d  p
2.0   2.0  5.0  8.0 b  e  q
3.0   3.0  6.0  9.0 c  f  r

This is of size 30 million rows

df2

lat  lon  x 
1.0  1.0  m
1.0  2.0  n
2.0  3.0  o

This is of size 20 million rows

Now for each row in df1 i want to merge with df2 having all the lat in df2 between lat1 and lat2 and lon in lon1 and lon2

lat1<=lat<=lat2
lon1<=lon<=lon2

I have tried
Best way to join / merge by range in pandas
how to perform an inner or outer join of DataFrames with Pandas on non-simplistic criterion

But still i run out of memory.
What is the most optimal way to do this?

Upvotes: 2

Views: 64

Answers (1)

Bimal Gangawal
Bimal Gangawal

Reputation: 395

import geopandas as gp
from shapely.geometry import Polygon, Point
df2_geo = gp.GeoDataFrame(df2, geometry=gp.points_from_xy(df2.latitude, df2.longitude))

def make_polygon(a,b,c,d):
    lat_list = [a,a,c,c]
    lon_list = [b,d,d,b]
    return Polygon(zip(lat_list, lon_list))

df1['geometry'] = df1.apply(lambda x: make_polygon(x['_from_latitude'], x['_from_longitude'], x['_to_latitude'],x['_to_longitude']),axis=1)              
df1_gep = gp.GeoDataFrame(df1)
result = geopandas.sjoin(df2_geo, df1_gep, how="inner", op='intersects')

@SubbuVidyaSekar Thanks for suggestion
This works perfectly fine

Upvotes: 0

Related Questions