cmp
cmp

Reputation: 568

Pandas multi-index dataframe cumulative sum

I have a multi-index dataframe below:

vessels_df.head(10)

                             eta_date
vessel           load_dates  
AM OCEAN SILVER  2020-06-05  2020-06-04
                 2020-06-06 
                 2020-06-07 
                 2020-06-08 
                 2020-06-09 
                 2020-06-10 
                 2020-06-11 
APJ ANGAD 
                 2020-06-09  2020-06-08
                 2020-06-10 
                 2020-06-11 
AQUATONKA       
                 2020-06-03  2020-06-02
                 2020-06-04 
                 2020-06-05 
                 2020-06-06 
                 2020-06-07 
                 2020-06-08 
                 2020-06-09 
                 2020-06-10 
                 2020-06-11 

and a dictionary with a list of daily charges that are incurred for every day beyond the FIRST day of the load_dates

demurrage_charges_dict = {
    'AM OCEAN SILVER': 11076,
    'APJ ANGAD': 21771,
    'AQUATONKA': 14312
}

desired output

I would like to create a column that is the cumulative sum of the daily charge over the period in the index, for example:

                            eta_date      demurrage_charges
vessel           load_dates  
AM OCEAN SILVER  2020-06-05  2020-06-04   0
                 2020-06-06               11,076
                 2020-06-07               22,152
                 2020-06-08               33,228
                 2020-06-09               44,304
                 2020-06-10               55,380
                 2020-06-11               66,456

I believe I could reset the index of the 'vessels_df', convert the demurrage_charges_dict to df and merge the two then use pd.cumsum(), but wondered if there is a more elegant way to perform this?

Much appreciated.

Upvotes: 0

Views: 200

Answers (1)

ALollz
ALollz

Reputation: 59549

cumcount the vessel index level and multiply that by the mapping of the vessel with the dict

idx = df.index.get_level_values('vessel')
df['demurrage_charges'] = (df.groupby(idx).cumcount()
                           * idx.map(demurrage_charges_dict))

                              eta_date  demurrage_charges
vessel          load_dates                               
AM OCEAN SILVER 2020-06-05  2020-06-04                  0
                2020-06-06        None              11076
                2020-06-07        None              22152
                2020-06-08        None              33228
                2020-06-09        None              44304
                2020-06-10        None              55380
                2020-06-11        None              66456
APJ ANGAD       2020-06-09  2020-06-08                  0
                2020-06-10        None              21771
                2020-06-11        None              43542
AQUATONKA       2020-06-03  2020-06-02                  0
                2020-06-04        None              14312
                2020-06-05        None              28624
                2020-06-06        None              42936
                2020-06-07        None              57248
                2020-06-08        None              71560
                2020-06-09        None              85872
                2020-06-10        None             100184
                2020-06-11        None             114496

Upvotes: 3

Related Questions