Reputation: 4470
DF:
fruits date amount
0 Apple 2018-01-01 100
1 Orange 2018-01-01 200
2 Apple 2018-01-01 150
3 Apple 2018-01-02 100
4 Orange 2018-01-02 100
5 Orange 2018-01-02 100
Code to create this:
f = [["Apple","2018-01-01",100],["Orange","2018-01-01",200],["Apple","2018-01-01",150],
["Apple","2018-01-02",100],["Orange","2018-01-02",100],["Orange","2018-01-02",100]]
df = pd.DataFrame(f,columns = ["fruits","date","amount"])
I am trying to aggregate the sale of fruits for each date and find the difference between sums
Expected Op:
date diff
2018-01-01 . 50
2018-01-02 . -100
As in find the sum of sales of Apple and orange and find the difference between the sums
I am able to find the sum:
df.groupby(["date","fruits"])["amount"].agg("sum")
date fruits
2018-01-01 Apple 250
Orange 200
2018-01-02 Apple 100
Orange 200
Name: amount, dtype: int64
Any suggestions on how to find the difference in pandas itself.
Upvotes: 0
Views: 320
Reputation: 862511
Add unstack
for reshape and then subtract with pop
for extract columns:
df = df.groupby(["date","fruits"])["amount"].sum().unstack()
df['diff'] = df.pop('Apple') - df.pop('Orange')
print (df)
fruits diff
date
2018-01-01 50
2018-01-02 -100
Upvotes: 1
Reputation: 13255
Using groupby
date
and apply
using lambda function
as:
df.groupby("date").apply(lambda x: x.loc[x['fruits']=='Apple','amount'].sum() -
x.loc[x['fruits']=='Orange','amount'].sum())
date
2018-01-01 50
2018-01-02 -100
dtype: int64
Or grouping the fruits separately and finding the difference:
A = df[df.fruits.isin(['Apple'])].groupby('date')['amount'].sum()
O = df[df.fruits.isin(['Orange'])].groupby('date')['amount'].sum()
O-A
date
2018-01-01 -50
2018-01-02 100
Name: amount, dtype: int64
Upvotes: 1
Reputation: 3103
def get_diff(grp):
grp = grp.groupby('fruits').agg(sum)['amount'].values
return grp[0] - grp[1]
df.groupby('date').apply(get_diff)
Output
date
2018-01-01 50
2018-01-02 -100
Upvotes: 1