JungleDiff
JungleDiff

Reputation: 3505

Python Pandas: conditional subtraction

enter image description here

enter image description here

I want to perform conditional subtraction on the dataframe (shown as the first picture).

Basically, this is what I want to do:

  1. Subtract the values of col1 and col2 of food and clothing between me and you and create new rows for the differences.

Since the first row has 'food' and 'me' and the third row has 'food' and 'you', you subtract the values of col1 and col2 of the third row from the first row (300 - 600 = -300, and 200 - 500 = -300).

Since the second row has 'clothing' and 'me' and the fourth row has 'clothing' and 'you', you subtract the values of col1 and col2 of the fourth row from the second row (500 - 200 = 300 and 600 - 700 = -100).

How do I implement this using Pandas dataframe?

Upvotes: 2

Views: 3524

Answers (2)

piRSquared
piRSquared

Reputation: 294506

A way to do it with eval

df \
  .set_index(['type2', 'type1']).unstack().T \
  .eval('us = me - you', inplace=False) \
  .T.stack().reset_index()

  type2     type1  col1  col2
0    me  clothing   500   600
1    me      food   300   200
2   you  clothing   200   700
3   you      food   600   500
4    us  clothing   300  -100
5    us      food  -300  -300

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153510

You could do it this way using pd.concat, groupby and taking advantage of Pandas intrinsic alignment of data based on indexes:

Input df:

df = pd.DataFrame({'type1':['food','clothing','food','clothing'],'type2':['me','me','you','you'],'col1':[300,500,600,200],'col2':[200,600,500,700]})


pd.concat([df.set_index(['type1','type2'])
  .groupby('type1')
  .apply(lambda x: x.iloc[0]-x.iloc[1])
  .assign(type2='us')
  .set_index('type2', append=True),
  df.set_index(['type1','type2'])]).reset_index()

For Pandas older that 0.20.0

pd.concat([df.set_index(['type1','type2'])
  .groupby(level=0)
  .apply(lambda x: x.iloc[0]-x.iloc[1])
  .assign(type2='us')
  .set_index('type2', append=True),
  df.set_index(['type1','type2'])]).sort_index(level=[1,0]).reset_index()

Output:

      type1 type2  col1  col2
0  clothing    us   300  -100
1      food    us  -300  -300
2      food    me   300   200
3  clothing    me   500   600
4      food   you   600   500
5  clothing   you   200   700

Upvotes: 2

Related Questions