reverse_geocoder on python with panda

Question

I am currently using this code to go through a file and get the district however it takes forever to execute since I have 118185 rows of data to go through. Is there another way to use reverse_geocoder that doesn't take that long

df["coord"]=list(zip(df["pickup_latitude"],df["pickup_longitude"]))
list1 = []
for x,y in df["coord"]: 
    coordinates=(x,y)
    newItem = rg.search(coordinates)[0].get('admin2')
    list1.append(newItem)

Peter Leimbigler · Accepted Answer

Based on the PyData 2015 demo notebook in the reverse_geocoder GitHub repository, you can pass a tuple of tuples into rg.search() to process multiple coordinate pairs at once.

# Convert lat and long columns to a tuple of tuples
coords = tuple(zip(df['pickup_latitude'], df['pickup_longitude']))

results_rg = rg.search(coords)
results_admin2 = [x.get('admin2') for x in results_rg]

# Optional: insert admin2 results into new df column
df['admin2'] = results_admin2

If this is still too slow, you can try a simple speed test by using only the first few rows of df. For example, to run the above code on the first 1000 rows of the DataFrame, change the first line to this:

coords = tuple(zip(df['pickup_latitude'].iloc[:1000], 
                   df['pickup_longitude'].iloc[:1000]))

reverse_geocoder on python with panda

Answers (1)

Related Questions