Reputation: 672
I have a database with transfers orders between two cities. I have, in each record, a departure date, the amount to be delivered, a returning date and the amount to be returned.
The database is something like this:
df = pd.DataFrame({"dep_date":[201701,201701,201702,201703], "del_amount":[100,200,300,400],"ret_date":[201703,201702,201703,201705], "ret_amount":[50,75,150,175]})
df
dep_date del_amount ret_date ret_amount
0 201701 100 201703 50
1 201701 200 201702 75
2 201702 300 201703 150
3 201703 400 201705 175
I want to get a pivot table with dep_data as index, showing the sum of del_amount in that month and the returned amount scheduled for the same month of departure date.
It's an odd construction, cause it seems to has two indexes. The result that I need is:
del_amount ret_amount
dep_date
201701 300 0
201702 300 75
201703 400 200
Note that some returning dates does not match with any departure month. Does anyone know if it is possible to build a proper aggfunc in pivot_table enviroment to achieve this? If it is not possible, can anyone tell me the best approach?
Thanks in advance
Upvotes: 1
Views: 1460
Reputation: 402593
You'll need two groupby
+ sum
operations, followed by a reindex
and concat
enation -
i = df.groupby(df.dep_date % 100)['del_amount'].sum()
j = df.groupby(df.ret_date % 100)['ret_amount'].sum()
pd.concat([i, j.reindex(i.index, fill_value=0)], 1)
del_amount ret_amount
dep_date
1 300 0
2 300 75
3 400 200
If you want to group on the entire date (and not just the month number), change df.groupby(df.dep_date % 100)
to df.groupby('dep_date')
.
Upvotes: 3
Reputation: 323306
split it into two df, then we calculation for each of them , then we do join
s=df.loc[:,df.columns.str.startswith('de')]
v=df.loc[:,df.columns.str.startswith('ret')]
s.set_index('dep_date').sum(level=0).join(v.set_index('ret_date').sum(level=0)).fillna(0)
Out[449]:
del_amount ret_amount
dep_date
201701 300 0.0
201702 300 75.0
201703 400 200.0
Upvotes: 3
Reputation: 76927
Use
In [97]: s1 = df.groupby('dep_date')['del_amount'].sum()
In [98]: s2 = df.groupby('ret_date')['ret_amount'].sum()
In [99]: s1.to_frame().join(s2.rename_axis('dep_date')).fillna(0)
Out[99]:
del_amount ret_amount
dep_date
201701 300 0.0
201702 300 75.0
201703 400 200.0
Upvotes: 3