Reputation: 3744
This is my code to extract latitudes and longitudes from location addresses in a CSV file.
import pandas as pd
import requests
import json
import time
GOOGLE_MAPS_API_URL = 'https://maps.googleapis.com/maps/api/geocode/json'
API_key= 'the-key'
def gmaps_geoencoder(address):
req = requests.get(GOOGLE_MAPS_API_URL+'?address='+address+'&key='+API_key)
res = req.json()
result = res['results'][0]
lat = result['geometry']['location']['lat']
lon = result['geometry']['location']['lng']
return lat, lon
input_csv_file = r'path\to\location_list_100.csv'
output_csv_file = r'path\to\location_list_100_new.csv'
df = pd.read_csv(input_csv_file)
#size of chunks of data to write to the csv
chunksize = 10
t = time.time()
for i in range(len(df)):
place = df['ADDRESS'][i]
lat, lon, res = gmaps_geoencoder(place)
df['Lat'][i] = lat
df['Lon'][i] = lon
df.to_csv(output_csv_file,
index=False,
chunksize=chunksize) #size of data to append for each loop
print('Time taken: '+str(time.time() - t)+'s')
It took 47.75818920135498s
for 100 records. That is, ~0.5s per record. How do I make it faster? I have ~ 1 million records to convert, and at this rate, it would take almost 6 days to finish the process!!! What is taking the time here: iterating through the dataframe, or fetching data with the gmaps API? If its the former, I suppose there should be some way to make it faster. But if its the latter, is there any fix?
Upvotes: 1
Views: 79
Reputation: 5344
Instead of that
for i in range(len(df)):
place = df['ADDRESS'][i]
lat, lon, res = gmaps_geoencoder(place)
df['Lat'][i] = lat
df['Lon'][i] = lon
df.to_csv(output_csv_file,
index=False,
chunksize=chunksize)
use this
df[['Lat', 'Lon', 'res']] = pd.DataFrame(df['ADDRESS'].apply(lambda x: gmaps_geoencoder(x)).values.tolist())
df.to_csv(output_csv_file,
index=False,
chunksize=chunksize)
Refer to this link for more info
Upvotes: 1