data_person
data_person

Reputation: 4470

difference between group sums pandas

DF:

    fruits     date      amount
0   Apple   2018-01-01   100
1   Orange  2018-01-01   200
2   Apple   2018-01-01   150
3   Apple   2018-01-02   100
4   Orange  2018-01-02   100
5   Orange  2018-01-02   100

Code to create this:

f = [["Apple","2018-01-01",100],["Orange","2018-01-01",200],["Apple","2018-01-01",150],
 ["Apple","2018-01-02",100],["Orange","2018-01-02",100],["Orange","2018-01-02",100]]
df = pd.DataFrame(f,columns = ["fruits","date","amount"])

I am trying to aggregate the sale of fruits for each date and find the difference between sums

Expected Op:

date          diff
2018-01-01 .   50 
2018-01-02 .  -100 

As in find the sum of sales of Apple and orange and find the difference between the sums

I am able to find the sum:

df.groupby(["date","fruits"])["amount"].agg("sum") 

   date        fruits
 2018-01-01    Apple     250
               Orange    200
 2018-01-02    Apple     100
               Orange    200
  Name: amount, dtype: int64

Any suggestions on how to find the difference in pandas itself.

Upvotes: 0

Views: 320

Answers (3)

jezrael
jezrael

Reputation: 862511

Add unstack for reshape and then subtract with pop for extract columns:

df = df.groupby(["date","fruits"])["amount"].sum().unstack()
df['diff'] = df.pop('Apple') - df.pop('Orange')
print (df)
fruits      diff
date            
2018-01-01    50
2018-01-02  -100

Upvotes: 1

Space Impact
Space Impact

Reputation: 13255

Using groupby date and apply using lambda function as:

df.groupby("date").apply(lambda x: x.loc[x['fruits']=='Apple','amount'].sum() - 
                                   x.loc[x['fruits']=='Orange','amount'].sum())

date
2018-01-01     50
2018-01-02   -100
dtype: int64

Or grouping the fruits separately and finding the difference:

A = df[df.fruits.isin(['Apple'])].groupby('date')['amount'].sum()
O = df[df.fruits.isin(['Orange'])].groupby('date')['amount'].sum()

O-A
date
2018-01-01    -50
2018-01-02    100
Name: amount, dtype: int64

Upvotes: 1

iDrwish
iDrwish

Reputation: 3103

def get_diff(grp):
    grp = grp.groupby('fruits').agg(sum)['amount'].values
    return grp[0] - grp[1]

df.groupby('date').apply(get_diff)

Output

date
2018-01-01     50
2018-01-02   -100

Upvotes: 1

Related Questions