Kristada673
Kristada673

Reputation: 3744

How to make geoencoding with Google Maps faster?

This is my code to extract latitudes and longitudes from location addresses in a CSV file.

import pandas as pd
import requests
import json
import time

GOOGLE_MAPS_API_URL = 'https://maps.googleapis.com/maps/api/geocode/json'
API_key= 'the-key'

def gmaps_geoencoder(address):
    req = requests.get(GOOGLE_MAPS_API_URL+'?address='+address+'&key='+API_key)
    res = req.json()
    result = res['results'][0]
    lat = result['geometry']['location']['lat']
    lon = result['geometry']['location']['lng']
    return lat, lon

input_csv_file = r'path\to\location_list_100.csv'
output_csv_file = r'path\to\location_list_100_new.csv'

df = pd.read_csv(input_csv_file)

#size of chunks of data to write to the csv
chunksize = 10

t = time.time()
for i in range(len(df)):
    place = df['ADDRESS'][i]
    lat, lon, res = gmaps_geoencoder(place)
    df['Lat'][i] = lat
    df['Lon'][i] = lon

    df.to_csv(output_csv_file,
          index=False,
          chunksize=chunksize) #size of data to append for each loop

print('Time taken: '+str(time.time() - t)+'s')

It took 47.75818920135498s for 100 records. That is, ~0.5s per record. How do I make it faster? I have ~ 1 million records to convert, and at this rate, it would take almost 6 days to finish the process!!! What is taking the time here: iterating through the dataframe, or fetching data with the gmaps API? If its the former, I suppose there should be some way to make it faster. But if its the latter, is there any fix?

Upvotes: 1

Views: 79

Answers (1)

Nihal
Nihal

Reputation: 5344

Instead of that

for i in range(len(df)):
    place = df['ADDRESS'][i]
    lat, lon, res = gmaps_geoencoder(place)
    df['Lat'][i] = lat
    df['Lon'][i] = lon

    df.to_csv(output_csv_file,
          index=False,
          chunksize=chunksize)

use this

df[['Lat', 'Lon', 'res']] = pd.DataFrame(df['ADDRESS'].apply(lambda x: gmaps_geoencoder(x)).values.tolist())

df.to_csv(output_csv_file,
          index=False,
          chunksize=chunksize)

Refer to this link for more info

Upvotes: 1

Related Questions