Amelio Vazquez-Reina
Amelio Vazquez-Reina

Reputation: 96264

Group computation between rows on condition

I have a dataframe like the following:

> df.head()

    channel   sym   quant      value    when
0   online    FTR   items   0.000515  before
1   video     FTR   items   0.000329   after
2   online    PAC   items   1.839338  before
3   video     PAC   items   2.355360   after
4   online    EPM   items   0.000947  before
5   test      EPM   items   0.000774   after
6   online    CLC   deals   0.000681  before
7   test      CLC   deals   0.000808   after
8   video     CLC   deals   0.000808   after
9   online    CPC   deals   1.620517  before

I am hoping to learn the difference in value from before to after for each unique combination of channel, sym and quant. How can I do this in Pandas?

I tried with:

def my_func(x):
  after_value  = x.ix[x['when']=='after','value']
  before_value = x.ix[x['when']=='before','value']
  return  after_value - before_value

df.groupby(['channel', 'sym', 'quant']).apply(my_func) 

but I got:

channel  sym  quant    
online   CLC  deals  6    NaN
                     8    NaN
         CPC  deals  10   NaN
         EPM  items  4    NaN
         FTR  items  0    NaN
         PAC  items  2    NaN
test     CLC  deals  7    NaN
         EPM  items  5    NaN
video    CLC  deals  9    NaN
         FTR  items  1    NaN
         PAC  items  3    NaN

which does not give me the desired result.

Upvotes: 0

Views: 33

Answers (1)

unutbu
unutbu

Reputation: 879103

Did you mean to group by ['sym', 'quant']? If so, then if you change the sign of the value whereever when equals before:

In [199]: df['value'] *= np.where(df['when'] == 'before', -1, 1)

In [200]: df
Out[200]: 
  channel  sym  quant     value    when
0  online  FTR  items -0.000515  before
1   video  FTR  items  0.000329   after
2  online  PAC  items -1.839338  before
3   video  PAC  items  2.355360   after
4  online  EPM  items -0.000947  before
5    test  EPM  items  0.000774   after
6  online  CLC  deals -0.000681  before
7    test  CLC  deals  0.000808   after
8   video  CLC  deals  0.000808   after
9  online  CPC  deals -1.620517  before

then you can find the difference by taking a sum:

In [202]: df.groupby(['sym', 'quant'])['value'].agg('sum')
Out[202]: 
sym  quant
CLC  deals    0.000935
CPC  deals   -1.620517
EPM  items   -0.000173
FTR  items   -0.000186
PAC  items    0.516022
Name: value, dtype: float64

Be careful that each group (with same sym and quant) contains only one before and after row; otherwise the sum might not be what you want. For example, if there is an after row, but no before, then the sum will equal the after value, as if the before value is 0. Indeed, that's what happens if you group by channel, sym and quant, since each group only contains one row:

In [201]: df.groupby(['channel', 'sym', 'quant'])['value'].agg('sum')
Out[201]: 
channel  sym  quant
online   CLC  deals   -0.000681
         CPC  deals   -1.620517
         EPM  items   -0.000947
         FTR  items   -0.000515
         PAC  items   -1.839338
test     CLC  deals    0.000808
         EPM  items    0.000774
video    CLC  deals    0.000808
         FTR  items    0.000329
         PAC  items    2.355360
Name: value, dtype: float64

Upvotes: 1

Related Questions