AJG519
AJG519

Reputation: 3379

Apply group specific function to groups in Pandas

I'm trying to figure out the best way to apply a function to groups within a Pandas dataframe where the function depends on the group.

Say I have the following dataframe:

>>> df=pd.DataFrame(np.random.randint(50,200,9), columns=['Value'])
>>> df['Year']=[2001,2002,2003]*3
>>> df['Location']=['A','A','A','B','B','B','C','C','C']
>>> df.set_index(['Location','Year'], inplace=True)
>>> df
               Value
Location Year       
A        2001    134
         2002    162
         2003    108
B        2001     59
         2002     52
         2003    124
C        2001    148
         2002    162
         2003     66
>>> 

And that I have the following dictionary of values, specific to each year:

>>> YearDict={2001:1.3, 2002:1.2, 2003:1.1}
>>> YearDict
{2001: 1.3, 2002: 1.2, 2003: 1.1}

What would be the best way to multiply the 'Value' column in my dataframe by the year specific value in my dictionary?

Currently I do something like this:

>>> df.reset_index(inplace=True)
>>> def f(row):
...     return row['Value']*YearDict[row['Year']]
... 
>>> 
>>> df.apply(f, axis=1)
0     84.5
1    210.0
2    201.3
3    248.3
4     94.8
5    177.1
6    140.4
7    218.4
8     68.2
dtype: float64
>>> 

Is this the best approach? Is their a method that does not require resetting the dataframe index?

Upvotes: 0

Views: 69

Answers (2)

wflynny
wflynny

Reputation: 18521

Looks like I'm a little late to the party, but you could do a groupby.transform. For example, given your raw df before you set the index:

In [4]: df
Out[4]: 
   Value  Year Loc
0     50  2001   A
1    141  2002   A
2     77  2003   A
3    143  2001   B
4    152  2002   B
5    123  2003   B
6    131  2001   C
7    196  2002   C
8    160  2003   C

In [5]: df.groupby('Year')['Value'].transform(lambda v: v * YearDict[v.name])
Out[5]: 
0     65.0
1    169.2
2     84.7
3    185.9
4    182.4
5    135.3
6    170.3
7    235.2
8    176.0
Name: Value, dtype: float64

This works because transform supplies the function the arguments of the series as well the name of the grouped-by variable (in the case the value of 'Year').

Alternatively, given your DataFrame with the MultiIndex, you could just do:

In [6]: df2 = df.set_index(['Loc', 'Year'])

In [7]: df2.groupby(level=1).transform(lambda v: v * YearDict[v.name])
Out[7]: 
          Value
Loc Year       
A   2001   65.0
    2002  169.2
    2003   84.7
B   2001  185.9
    2002  182.4
    2003  135.3
C   2001  170.3
    2002  235.2
    2003  176.0

which I think is pretty compact and readable. The level=1 argument means to groupby the second level of your MultiIndex, which in this case is 'Year'.

Upvotes: 1

Marius
Marius

Reputation: 60080

You can map a function on the index. Each row in the dataframe has a (Location, Year) tuple as its index, so you can do:

df.index.map(lambda t: YearDict[t[1]])
Out[11]: array([ 1.3,  1.2,  1.1,  1.3,  1.2,  1.1,  1.3,  1.2,  1.1])

So multiplying by these values looks like:

year_mults = df.index.map(lambda t: YearDict[t[1]])

df['Value'] * year_mults
Out[13]: 
Location  Year
A         2001    247.0
          2002    160.8
          2003    119.9
B         2001    102.7
          2002    182.4
          2003    202.4
C         2001     71.5
          2002    178.8
          2003    211.2
Name: Value, dtype: float64

Upvotes: 1

Related Questions