ktorquem
ktorquem

Reputation: 55

Sum values from DataFrame into Parent Index - Python/Pandas

I'm working with Mint transaction data and trying to sum the values from each category into it's parent category.

I have a dataframe mint_data that is created from all my Mint transactions:

mint_data = tranactions_data.pivot(index='Category', columns='Date', values='Amount')

mint_data image

And a dict with Category:Parent pairs (this uses xlwings to pull from excel sheet)

cat_parent = cats_sheet.range('A1').expand().options(dict).value

Cat:Parent image

I'm not sure how to go about looping through the mint_data df and summing amounts into the parent category. I would like to keep the data frame format exactly the same, just replacing the parent values.

Here is an example df:

        A B C D E
par_a   0 0 5 0 0
cat1a   5 2 3 2 1
cat2a   0 1 2 1 0
par_b   1 0 1 1 2
cat1b   0 1 2 1 0
cat2b   1 1 1 1 1
cat3b   0 1 2 1 0

I also have a dict with

{'par_a': 'par_a',
 'cat1a': 'par_a',
 'cat2a': 'par_a',
 'par_b': 'par_b',
 'cat1b': 'par_b',
 'cat2b': 'par_b',
 'cat3b': 'par_b'}

I am trying to get the dataframe to end up with

        A B C D E
par_a   5 3 10 3 1
cat1a   5 2 3 2 1
cat2a   0 1 2 1 0
par_b   2 3 6 4 3
cat1b   0 1 2 1 0
cat2b   1 1 1 1 1
cat3b   0 1 2 1 0

Upvotes: 3

Views: 250

Answers (1)

JohnE
JohnE

Reputation: 30424

Let's call your dictionary "dct" and then make a new column that maps to the parent:

>>> df['parent'] = df.reset_index()['index'].map(dct).values

       A  B  C  D  E parent
par_a  0  0  5  0  0  par_a
cat1a  5  2  3  2  1  par_a
cat2a  0  1  2  1  0  par_a
par_b  1  0  1  1  2  par_b
cat1b  0  1  2  1  0  par_b
cat2b  1  1  1  1  1  par_b
cat3b  0  1  2  1  0  par_b

Then sum by parent:

>>> df_sum = df.groupby('parent').sum()

        A  B   C  D  E
parent                
par_a   5  3  10  3  1
par_b   2  3   6  4  3

In many cases you would stop there, but since you want to combine the parent/child data, you need some sort of merge. combine_first will work well here since it will selectively update in the direction you want:

>>> df_new = df_sum.combine_first(df)

         A    B     C    D    E parent
cat1a  5.0  2.0   3.0  2.0  1.0  par_a
cat1b  0.0  1.0   2.0  1.0  0.0  par_b
cat2a  0.0  1.0   2.0  1.0  0.0  par_a
cat2b  1.0  1.0   1.0  1.0  1.0  par_b
cat3b  0.0  1.0   2.0  1.0  0.0  par_b
par_a  5.0  3.0  10.0  3.0  1.0  par_a
par_b  2.0  3.0   6.0  4.0  3.0  par_b

You mentioned a multi-index in a comment, so you may prefer to organize it more like this:

>>> df_new.reset_index().set_index(['parent','index']).sort_index()

                A    B     C    D    E
parent index                          
par_a  cat1a  5.0  2.0   3.0  2.0  1.0
       cat2a  0.0  1.0   2.0  1.0  0.0
       par_a  5.0  3.0  10.0  3.0  1.0
par_b  cat1b  0.0  1.0   2.0  1.0  0.0
       cat2b  1.0  1.0   1.0  1.0  1.0
       cat3b  0.0  1.0   2.0  1.0  0.0
       par_b  2.0  3.0   6.0  4.0  3.0

Upvotes: 1

Related Questions