Problem running a python function against a pandas dataframe and returning values into the dataframe

Question

I've got a list of over 1mm lat/long pairs that I'm looking to reverse geocode using the reverse-geocoder library. I've got my dataframe set up with columns for lat, long, and a pair which I constructed as (lat, long).

My code:

import numpy as np
import scipy
import pandas as pdimport reverse_geocoder as rg

df = pd.read_csv ('reverse.csv')

This provides a dataframe:

      Unnamed:  0      email        lat     long
   0      1            a@email.com  42.36   -71.07  
   1      2            b@email.com  37.72   -122.44 
   2      3            c@email.com  40.83   -74.25
   3      4            d@email.com  41.70   -70.22
   4      5            f@email.com  34.27   -118.71

Realizing I need a few more columns, I add the following:

df['reversePair'] = list(zip(df.lat, df.long))

Which creates a datacolumn and has values such as (42.36, -71.07)

I then added additional columns:

df=df.assign(city="")
df=df.assign(state="")
df=df.assign(country="")

All's well up to this point. I then try to run the function row by row (there has to be a better way, especially since the package can take multiple tuples but I've struggled at this point already):

(Note for the below, limiting to two records for example purposes)

import itertools

n = 2

for item in itertools.islice(df['reversePair'], n):
    print(item) 
    x = rg.search(item)
    for i in x:
        print(i["name"])
        print(i["admin1"])
        print(i["cc"])
    
        df["city"] = i["name"]
        df["state"] = i["admin1"]
        df["country"] = i["cc"]
    
df.head(2)

This then gives me the following correct results, but in the dataset it is only adding (overwriting?) the last value returned into both records. It's driving me crazy.

    (42.36, -71.07)
    Boston
    Massachusetts
    US
    (37.72, -122.44)
    Daly City
    California
    US
      Unnamed: 0    email       lat     long    city    reversePair state     country
   0    1.0     a@email.com 42.36   -71.07  Daly City   (42.36, -71.07)   California    US
   1    2.0     b@email.com 37.72   -122.44 Daly City   (37.72, -122.44)  California    US

Row 0 should be city = Boston, state = Massachusetts, country = US (it is, but assuming it's from the wrong record).

Any ideas on where I'm going wrong with this would be much appreciated. Also, if anyone can see a way to make it faster, that would also be very much appreciated! Thank you!

Problem running a python function against a pandas dataframe and returning values into the dataframe

Answers (1)

Related Questions