Reputation: 55
I'm working with Mint transaction data and trying to sum the values from each category into it's parent category.
I have a dataframe mint_data that is created from all my Mint transactions:
mint_data = tranactions_data.pivot(index='Category', columns='Date', values='Amount')
And a dict with Category:Parent pairs (this uses xlwings to pull from excel sheet)
cat_parent = cats_sheet.range('A1').expand().options(dict).value
I'm not sure how to go about looping through the mint_data df and summing amounts into the parent category. I would like to keep the data frame format exactly the same, just replacing the parent values.
Here is an example df:
A B C D E
par_a 0 0 5 0 0
cat1a 5 2 3 2 1
cat2a 0 1 2 1 0
par_b 1 0 1 1 2
cat1b 0 1 2 1 0
cat2b 1 1 1 1 1
cat3b 0 1 2 1 0
I also have a dict with
{'par_a': 'par_a',
'cat1a': 'par_a',
'cat2a': 'par_a',
'par_b': 'par_b',
'cat1b': 'par_b',
'cat2b': 'par_b',
'cat3b': 'par_b'}
I am trying to get the dataframe to end up with
A B C D E
par_a 5 3 10 3 1
cat1a 5 2 3 2 1
cat2a 0 1 2 1 0
par_b 2 3 6 4 3
cat1b 0 1 2 1 0
cat2b 1 1 1 1 1
cat3b 0 1 2 1 0
Upvotes: 3
Views: 250
Reputation: 30424
Let's call your dictionary "dct" and then make a new column that maps to the parent:
>>> df['parent'] = df.reset_index()['index'].map(dct).values
A B C D E parent
par_a 0 0 5 0 0 par_a
cat1a 5 2 3 2 1 par_a
cat2a 0 1 2 1 0 par_a
par_b 1 0 1 1 2 par_b
cat1b 0 1 2 1 0 par_b
cat2b 1 1 1 1 1 par_b
cat3b 0 1 2 1 0 par_b
Then sum by parent:
>>> df_sum = df.groupby('parent').sum()
A B C D E
parent
par_a 5 3 10 3 1
par_b 2 3 6 4 3
In many cases you would stop there, but since you want to combine the parent/child data, you need some sort of merge. combine_first
will work well here since it will selectively update in the direction you want:
>>> df_new = df_sum.combine_first(df)
A B C D E parent
cat1a 5.0 2.0 3.0 2.0 1.0 par_a
cat1b 0.0 1.0 2.0 1.0 0.0 par_b
cat2a 0.0 1.0 2.0 1.0 0.0 par_a
cat2b 1.0 1.0 1.0 1.0 1.0 par_b
cat3b 0.0 1.0 2.0 1.0 0.0 par_b
par_a 5.0 3.0 10.0 3.0 1.0 par_a
par_b 2.0 3.0 6.0 4.0 3.0 par_b
You mentioned a multi-index in a comment, so you may prefer to organize it more like this:
>>> df_new.reset_index().set_index(['parent','index']).sort_index()
A B C D E
parent index
par_a cat1a 5.0 2.0 3.0 2.0 1.0
cat2a 0.0 1.0 2.0 1.0 0.0
par_a 5.0 3.0 10.0 3.0 1.0
par_b cat1b 0.0 1.0 2.0 1.0 0.0
cat2b 1.0 1.0 1.0 1.0 1.0
cat3b 0.0 1.0 2.0 1.0 0.0
par_b 2.0 3.0 6.0 4.0 3.0
Upvotes: 1