Reputation: 23
I am currently using this code to go through a file and get the district however it takes forever to execute since I have 118185 rows of data to go through.
Is there another way to use reverse_geocoder
that doesn't take that long
df["coord"]=list(zip(df["pickup_latitude"],df["pickup_longitude"]))
list1 = []
for x,y in df["coord"]:
coordinates=(x,y)
newItem = rg.search(coordinates)[0].get('admin2')
list1.append(newItem)
Upvotes: 2
Views: 1258
Reputation: 11105
Based on the PyData 2015 demo notebook in the reverse_geocoder
GitHub repository, you can pass a tuple of tuples into rg.search()
to process multiple coordinate pairs at once.
# Convert lat and long columns to a tuple of tuples
coords = tuple(zip(df['pickup_latitude'], df['pickup_longitude']))
results_rg = rg.search(coords)
results_admin2 = [x.get('admin2') for x in results_rg]
# Optional: insert admin2 results into new df column
df['admin2'] = results_admin2
If this is still too slow, you can try a simple speed test by using only the first few rows of df
. For example, to run the above code on the first 1000 rows of the DataFrame, change the first line to this:
coords = tuple(zip(df['pickup_latitude'].iloc[:1000],
df['pickup_longitude'].iloc[:1000]))
Upvotes: 3