Reputation: 399
I have a database from max mind.which is giving me location information from IP. I have written the below function to retrieve city and country from the ip :-
import geoip2.database
def country(ipa):
with geoip2.database.Reader('/home/jupyter/GeoIP2-City.mmdb') as reader:
try:
response = reader.city(ipa)
response = response.country.iso_code
return response
except:
return 'NA'
def city(ipa):
with geoip2.database.Reader('/home/jupyter/GeoIP2-City.mmdb') as reader:
try:
response = reader.city(ipa)
response = response.city.name
return response
except:
return 'NA'
I am processing this every minute and applying to a column raddr
in pandas:-
df['country']=df['raddr'].apply(country)
df['city']=df['raddr'].apply(city)
The problem is it's taking more than 3 minutes to execute in every iteration I am getting around 150,000 rows and i am applying the function on each of them.
I want to complete this operation in less than a minute. Any advice.
Upvotes: 0
Views: 215
Reputation: 10437
Your functions are not optimized. Imagine having to read the database for every row when applying your functions. Even the maxmind's github specifically comments that your reader object is expensive to create:
>>> # This creates a Reader object. You should use the same object
>>> # across multiple requests as creation of it is expensive.
What you should do is pass an extra keyword argument to your functions:
def country(ipa, reader):
try:
response = reader.city(ipa)
response = response.country.iso_code
return response
except:
return 'NA'
def city(ipa, reader):
try:
response = reader.city(ipa)
response = response.city.name
return response
except:
return 'NA'
And then call your apply function with the extra keyword argument:
with geoip2.database.Reader('/home/jupyter/GeoIP2-City.mmdb') as reader:
df['country'] = df['raddr'].apply(country, reader=reader)
df['city'] = df['raddr'].apply(city, reader=reader)
Upvotes: 2