abhi
abhi

Reputation: 399

making apply fast in pandas

I have a database from max mind.which is giving me location information from IP. I have written the below function to retrieve city and country from the ip :-

import geoip2.database
def country(ipa):
    with geoip2.database.Reader('/home/jupyter/GeoIP2-City.mmdb') as reader:
        try:
            response = reader.city(ipa)
            response = response.country.iso_code
            return response
        except:
            return 'NA'
        
def city(ipa):
    with geoip2.database.Reader('/home/jupyter/GeoIP2-City.mmdb') as reader:
        try:
            response = reader.city(ipa)
            response = response.city.name
            return response
        except:
            return 'NA'

I am processing this every minute and applying to a column raddr in pandas:-

df['country']=df['raddr'].apply(country)
df['city']=df['raddr'].apply(city)

The problem is it's taking more than 3 minutes to execute in every iteration I am getting around 150,000 rows and i am applying the function on each of them.

I want to complete this operation in less than a minute. Any advice.

Upvotes: 0

Views: 215

Answers (1)

Scratch'N'Purr
Scratch'N'Purr

Reputation: 10437

Your functions are not optimized. Imagine having to read the database for every row when applying your functions. Even the maxmind's github specifically comments that your reader object is expensive to create:

>>> # This creates a Reader object. You should use the same object
>>> # across multiple requests as creation of it is expensive.

What you should do is pass an extra keyword argument to your functions:

def country(ipa, reader):
    try:
        response = reader.city(ipa)
        response = response.country.iso_code
        return response
    except:
        return 'NA'

def city(ipa, reader):
    try:
        response = reader.city(ipa)
        response = response.city.name
        return response
    except:
        return 'NA'

And then call your apply function with the extra keyword argument:

with geoip2.database.Reader('/home/jupyter/GeoIP2-City.mmdb') as reader:
    df['country'] = df['raddr'].apply(country, reader=reader)
    df['city'] = df['raddr'].apply(city, reader=reader)

Upvotes: 2

Related Questions