Moran Reznik
Moran Reznik

Reputation: 1371

apply() returns a DataFrame instead of Series

In the folwwing code:
import pandas as pd
import sqlite3
import math
import numpy
con = sqlite3.connect(r'C:\Python34\factbook.db')
facts = pd.read_sql_query('select * from facts;', con)
facts.dropna(inplace=True)
facts = facts[facts['area_land']!=0][:]
facts = facts[facts['population']!=0][:]
facts.reset_index(drop=True, inplace=True)
def pop_50(name):
    pop = facts[facts['name'] == name]['population']
    perc = facts[facts['name'] == name]['population_growth']
    new_pop = pop*(math.e**(35*perc))
    return new_pop


x=pd.Series(data=facts['name'])
z = x.apply(pop_50) 

x is a Series:

0                                        Afghanistan
1                                            Albania
2                                            Algeria
3                                            Andorra
4                                             Angola
5                                Antigua and Barbuda
6                                          Argentina
7                                            Armenia

and so on...

But z isn't. Here is a link for seeing what it is (a DataFrame): https://www.scribd.com/document/357697929/Doc1

I cant understand why. The pop_50 func gives back a single result (I tested it), so why is zed a DataFrame? How can pop_50 return a series? it takes a row (where facts['name']==name) and from it a single value (under the population column) than call it pop. it than do the same idea for perc. new_pop is a math combination of 2 singel values so its a single value as well, and the func return just that, dont it?

Thank you.

Upvotes: 3

Views: 1676

Answers (1)

piRSquared
piRSquared

Reputation: 294358

pop_50 returns a pd.Series. x.apply(pop_50) calls the function pop_50 for every row of x with the value of that row being passed to pop_50 as the argument name. So for the first row in x, you return a series. And again for the second row. You end up with a series of series... which is a dataframe. Moreover, the index of x will be the columns of your result.

Try this instead:

facts2 = facts.set_index('name')

def pop_50(name):

    pop = facts2.at[name, 'population']
    perc = facts2.at[name, 'population_growth']
    new_pop = pop*(math.e**(35*perc))
    return new_pop

You can also use pd.Series.squeeze

def pop_50(name):
    pop = facts[facts['name'] == name]['population'].squeeze()
    perc = facts[facts['name'] == name]['population_growth'].squeeze()
    new_pop = pop*(math.e**(35*perc))
    return new_pop

If for whatever reason you can't change pop_50, wrap it in a lambda

z = x.apply(lambda name: pop_50(name).squeeze()) 

Upvotes: 1

Related Questions