O Pardal
O Pardal

Reputation: 672

Pandas Pivot Table sum based on other column (as though had two indexes)

I have a database with transfers orders between two cities. I have, in each record, a departure date, the amount to be delivered, a returning date and the amount to be returned.

The database is something like this:

df = pd.DataFrame({"dep_date":[201701,201701,201702,201703], "del_amount":[100,200,300,400],"ret_date":[201703,201702,201703,201705], "ret_amount":[50,75,150,175]})

df
   dep_date  del_amount  ret_date  ret_amount
0    201701        100    201703         50
1    201701        200    201702         75
2    201702        300    201703        150
3    201703        400    201705        175

I want to get a pivot table with dep_data as index, showing the sum of del_amount in that month and the returned amount scheduled for the same month of departure date.

It's an odd construction, cause it seems to has two indexes. The result that I need is:

          del_amount  ret_amount
dep_date                      
201701          300          0
201702          300         75
201703          400        200

Note that some returning dates does not match with any departure month. Does anyone know if it is possible to build a proper aggfunc in pivot_table enviroment to achieve this? If it is not possible, can anyone tell me the best approach?

Thanks in advance

Upvotes: 1

Views: 1460

Answers (3)

cs95
cs95

Reputation: 402593

You'll need two groupby + sum operations, followed by a reindex and concatenation -

i = df.groupby(df.dep_date % 100)['del_amount'].sum()
j = df.groupby(df.ret_date % 100)['ret_amount'].sum()

pd.concat([i, j.reindex(i.index, fill_value=0)], 1)

          del_amount  ret_amount
dep_date                        
1                300           0
2                300          75
3                400         200

If you want to group on the entire date (and not just the month number), change df.groupby(df.dep_date % 100) to df.groupby('dep_date').

Upvotes: 3

BENY
BENY

Reputation: 323306

split it into two df, then we calculation for each of them , then we do join

s=df.loc[:,df.columns.str.startswith('de')]
v=df.loc[:,df.columns.str.startswith('ret')]

s.set_index('dep_date').sum(level=0).join(v.set_index('ret_date').sum(level=0)).fillna(0)
Out[449]: 
          del_amount  ret_amount
dep_date                        
201701           300         0.0
201702           300        75.0
201703           400       200.0

Upvotes: 3

Zero
Zero

Reputation: 76927

Use

In [97]: s1 = df.groupby('dep_date')['del_amount'].sum()

In [98]: s2 = df.groupby('ret_date')['ret_amount'].sum()

In [99]: s1.to_frame().join(s2.rename_axis('dep_date')).fillna(0)
Out[99]:
          del_amount  ret_amount
dep_date
201701           300         0.0
201702           300        75.0
201703           400       200.0

Upvotes: 3

Related Questions