Simd
Simd

Reputation: 21333

How to filter aggregated data by total value

I have a dataframe with my finances in it. As a sample see:

{'Transaction Date': {0: Timestamp('2022-05-04 00:00:00'),
  1: Timestamp('2022-05-04 00:00:00'),
  2: Timestamp('2022-04-04 00:00:00'),
  3: Timestamp('2022-04-04 00:00:00'),
  4: Timestamp('2022-04-04 00:00:00'),
  5: Timestamp('2022-04-04 00:00:00'),
  6: Timestamp('2022-04-04 00:00:00'),
  7: Timestamp('2022-04-04 00:00:00'),
  8: Timestamp('2022-04-04 00:00:00'),
  9: Timestamp('2022-01-04 00:00:00')},
 'Transaction Description': {0: 'School',
  1: 'Cleaner',
  2: 'Taxi',
  3: 'shop',
  4: 'MOBILE',
  5: 'Restaurant',
  6: 'Restaurant',
  7: 'shop',
  8: 'Taxi',
  9: 'shop'},
 'Debit Amount': {0: 15.0,
  1: 26.0,
  2: 48.48,
  3: 9.18,
  4: 7.0,
  5: 10.05,
  6: 9.1,
  7: 2.14,
  8: 16.0,
  9: 11.68}

I can print a summary for each month with:

reportseries = df.assign(ym=pd.to_datetime(df['Transaction Date']).dt.strftime('%Y-%m')).groupby(['ym','Transaction Description' ] )['Debit Amount'].sum() 
print(reportseries)

This gives me:

ym       Transaction Description
2022-01  shop                       11.68
2022-04  MOBILE                      7.00
         Restaurant                 19.15
         Taxi                       64.48
         shop                       11.32
2022-05  Cleaner                    26.00
         School                     15.00
Name: Debit Amount, dtype: float64

I would like two things for each month:

First to show only those Transaction Descriptions where the sum of the Debit Amounts is at most 20 (for example), and also to add a Transaction Description for each month that is the difference between the total spent in that month and the sum of the values shown. That is how much was spend on lots of small things that aren't shown.

How can I do that?

This would mean I would only show:

2022-01  Small items                11.68 
2022-04  Taxi                       64.48
         Small items                37.47
2022-05  Cleaner                    26.00
         Small items                15.00

Upvotes: 1

Views: 51

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71687

Annotated code

# Pivot the dataframe with aggfunc=sum to calculate the total debit
# amounts per month and transaction description
df['ym'] = pd.to_datetime(df['Transaction Date']).dt.strftime('%Y-%m')
s = df.pivot_table('Debit Amount', 'ym', 'Transaction Description', aggfunc='sum')

# Mask the values < 20 to show transaction Descriptions
# where the sum of the Debit Amounts is greater than 20
s1 = s[s >= 20]

# Calculate difference of sums to get the amount spend on lots of small things
s1['Small items'] = s.sum(1) - s1.sum(1)

# Stack to reshape into multiindex series
s1 = s1.stack()

Result

ym       Transaction Description
2022-01  Small items                11.68
2022-04  Taxi                       64.48
         Small items                37.47
2022-05  Cleaner                    26.00
         Small items                15.00
dtype: float64

Upvotes: 1

Related Questions