Reputation: 3379
I'm trying to figure out the best way to apply a function to groups within a Pandas dataframe where the function depends on the group.
Say I have the following dataframe:
>>> df=pd.DataFrame(np.random.randint(50,200,9), columns=['Value'])
>>> df['Year']=[2001,2002,2003]*3
>>> df['Location']=['A','A','A','B','B','B','C','C','C']
>>> df.set_index(['Location','Year'], inplace=True)
>>> df
Value
Location Year
A 2001 134
2002 162
2003 108
B 2001 59
2002 52
2003 124
C 2001 148
2002 162
2003 66
>>>
And that I have the following dictionary of values, specific to each year:
>>> YearDict={2001:1.3, 2002:1.2, 2003:1.1}
>>> YearDict
{2001: 1.3, 2002: 1.2, 2003: 1.1}
What would be the best way to multiply the 'Value' column in my dataframe by the year specific value in my dictionary?
Currently I do something like this:
>>> df.reset_index(inplace=True)
>>> def f(row):
... return row['Value']*YearDict[row['Year']]
...
>>>
>>> df.apply(f, axis=1)
0 84.5
1 210.0
2 201.3
3 248.3
4 94.8
5 177.1
6 140.4
7 218.4
8 68.2
dtype: float64
>>>
Is this the best approach? Is their a method that does not require resetting the dataframe index?
Upvotes: 0
Views: 69
Reputation: 18521
Looks like I'm a little late to the party, but you could do a groupby.transform
. For example, given your raw df before you set the index:
In [4]: df
Out[4]:
Value Year Loc
0 50 2001 A
1 141 2002 A
2 77 2003 A
3 143 2001 B
4 152 2002 B
5 123 2003 B
6 131 2001 C
7 196 2002 C
8 160 2003 C
In [5]: df.groupby('Year')['Value'].transform(lambda v: v * YearDict[v.name])
Out[5]:
0 65.0
1 169.2
2 84.7
3 185.9
4 182.4
5 135.3
6 170.3
7 235.2
8 176.0
Name: Value, dtype: float64
This works because transform supplies the function the arguments of the series as well the name
of the grouped-by variable (in the case the value of 'Year').
Alternatively, given your DataFrame with the MultiIndex, you could just do:
In [6]: df2 = df.set_index(['Loc', 'Year'])
In [7]: df2.groupby(level=1).transform(lambda v: v * YearDict[v.name])
Out[7]:
Value
Loc Year
A 2001 65.0
2002 169.2
2003 84.7
B 2001 185.9
2002 182.4
2003 135.3
C 2001 170.3
2002 235.2
2003 176.0
which I think is pretty compact and readable. The level=1
argument means to groupby the second level of your MultiIndex, which in this case is 'Year'.
Upvotes: 1
Reputation: 60080
You can map a function on the index. Each row in the dataframe has a (Location, Year) tuple as its index, so you can do:
df.index.map(lambda t: YearDict[t[1]])
Out[11]: array([ 1.3, 1.2, 1.1, 1.3, 1.2, 1.1, 1.3, 1.2, 1.1])
So multiplying by these values looks like:
year_mults = df.index.map(lambda t: YearDict[t[1]])
df['Value'] * year_mults
Out[13]:
Location Year
A 2001 247.0
2002 160.8
2003 119.9
B 2001 102.7
2002 182.4
2003 202.4
C 2001 71.5
2002 178.8
2003 211.2
Name: Value, dtype: float64
Upvotes: 1