Reputation: 96264
I have a dataframe like the following:
> df.head()
channel sym quant value when
0 online FTR items 0.000515 before
1 video FTR items 0.000329 after
2 online PAC items 1.839338 before
3 video PAC items 2.355360 after
4 online EPM items 0.000947 before
5 test EPM items 0.000774 after
6 online CLC deals 0.000681 before
7 test CLC deals 0.000808 after
8 video CLC deals 0.000808 after
9 online CPC deals 1.620517 before
I am hoping to learn the difference in value from before
to after
for each unique combination of channel
, sym
and quant
. How can I do this in Pandas?
I tried with:
def my_func(x):
after_value = x.ix[x['when']=='after','value']
before_value = x.ix[x['when']=='before','value']
return after_value - before_value
df.groupby(['channel', 'sym', 'quant']).apply(my_func)
but I got:
channel sym quant
online CLC deals 6 NaN
8 NaN
CPC deals 10 NaN
EPM items 4 NaN
FTR items 0 NaN
PAC items 2 NaN
test CLC deals 7 NaN
EPM items 5 NaN
video CLC deals 9 NaN
FTR items 1 NaN
PAC items 3 NaN
which does not give me the desired result.
Upvotes: 0
Views: 33
Reputation: 879103
Did you mean to group by ['sym', 'quant']
? If so, then if you change the sign of the value whereever when
equals before
:
In [199]: df['value'] *= np.where(df['when'] == 'before', -1, 1)
In [200]: df
Out[200]:
channel sym quant value when
0 online FTR items -0.000515 before
1 video FTR items 0.000329 after
2 online PAC items -1.839338 before
3 video PAC items 2.355360 after
4 online EPM items -0.000947 before
5 test EPM items 0.000774 after
6 online CLC deals -0.000681 before
7 test CLC deals 0.000808 after
8 video CLC deals 0.000808 after
9 online CPC deals -1.620517 before
then you can find the difference by taking a sum:
In [202]: df.groupby(['sym', 'quant'])['value'].agg('sum')
Out[202]:
sym quant
CLC deals 0.000935
CPC deals -1.620517
EPM items -0.000173
FTR items -0.000186
PAC items 0.516022
Name: value, dtype: float64
Be careful that each group (with same sym
and quant
) contains only one before
and after
row; otherwise the sum might not be what you want. For example, if there is an after
row, but no before
, then the sum will equal the after
value, as if the before
value is 0. Indeed, that's what happens if you group by channel
, sym
and quant
, since each group only contains one row:
In [201]: df.groupby(['channel', 'sym', 'quant'])['value'].agg('sum')
Out[201]:
channel sym quant
online CLC deals -0.000681
CPC deals -1.620517
EPM items -0.000947
FTR items -0.000515
PAC items -1.839338
test CLC deals 0.000808
EPM items 0.000774
video CLC deals 0.000808
FTR items 0.000329
PAC items 2.355360
Name: value, dtype: float64
Upvotes: 1