Reputation: 2542
I have a dataframe (Call it A) with a set of GPS lat/long coordinates
Lat | Long
28.6752213, 77.09311140000001
I have another CSV (with many rows, over a million - call it B) of the form
Which basically is a grid, with the lat/long coordinates of the 4 corners.
The Problem
I need to find for every row in A, which (non-unique) row it is bounded by in B. As in, the gps coordinates are inside the box as described by the row in B. I have a function that returns True/False
when given the coords from A and the row in B.
Right now I'm doing a bruteforce approach, iterating through the whole B dataframe and checking every row if it belongs to that box or not. However, this is incredibly inefficient and very slow.
I'm sure there must be a better way for this, as it's a common problem. Can anyone point me to them?
Thank you! :)
Edit:
Code for the function im using to find if a particular gps_coord belongs in the box defined by a row
import matplotlib.path as path
def find_if_point_in_bounding_box(row,gps_coords):
top_left_lat = row['top_left_lat']
top_left_long = row['top_left_long']
top_right_lat = row['top_right_lat']
top_right_long = row['top_right_long']
bottom_left_lat = row['bottom_left_lat']
bottom_left_long = row['bottom_left_long']
bottom_right_lat = row['bottom_right_lat']
bottom_right_long = row['bottom_right_long']
lat,long = gps_coords
# create box
p = path.Path([(top_left_lat, top_left_long),(top_right_lat,top_right_long),(bottom_left_lat,bottom_left_long),(bottom_right_lat,bottom_right_long)])
res = p.contains_points([(lat,long)])[0]
return res
Upvotes: 0
Views: 827
Reputation: 1122022
Your 8 coordinates contain only 4 unique values each: 2 latitudes (forming the top and bottom boundaries of each box, or the northerly and southerly bounds) and two longitudes (the left and right boundaries, westerly and easterly). Between the 4 (lat, lon) combinations for the 4 corners you’ll see that there are repeated values. You only need to compare your positions with the 4 boundaries, the latitude should fall between (or on) the two latitude bounds, and the longitude should fall between the two longitude bounds.
So you can simply ask for rows that have matching bounding boxes based on one each of top_*_lat
and bottom_*_lat
for the latitude, and one each of *_left_long
and *_right_long
for the longitudes:
lat, long = <latitude>, <longitude>
matching_rows = df.query(
# top and bottom latitudes, top lat > bottom lat, north to south
"top_left_lat >= @lat >= bottom_right_lat and "
# left and right longitudes, left long < right long, west to east
"top_left_long <= @long <= bottom_right_long"
)
The above pandas.DataFrame.query()
expression just does a simple geometric point containment test and assumes that your bounding boxes do not cross the anti-meridian (international dateline) nor overlap with either pole.
You'll have to do this for each position in your input dataframe; Pandas can't merge dataframes based on arbitrary expressions (yet). You could group your inputs by one of the two coordinates to produce a subset of rows that match that one coordinate, then further filter them on the second coordinate for each group.
If your input dataframe is also very large, then it may perhaps be better to use a database for such a join.
Upvotes: 2