nekovolta
nekovolta

Reputation: 516

Group dataframe by similar rows

I have an excel file, with expenses that I need to group by the same concept. For simplicity, the dataframe would be something like this:

Date Concept Amount
07-12-2021 A 01 -100
01-12-2021 A 02 -150
30-11-2021 B 01 -50
29-11-2021 C 01 -25
23-11-2021 D 01 -70
22-11-2021 C 02 -80
19-11-2021 B 02 -65
18-11-2021 B 03 -25
17-11-2021 A 03 -30
16-11-2021 D 02 -120

The concept 'A 0X', refer to the same general concept 'A' but it has different identifiers. Bear in mind that concept 'A 0X' is for the sake of simplicity, but the concept will be a more complex expression

I am trying using this code to group by the same concept:

import pandas as pd
df = pd.read_excel("Expenses.xlsx")
df.groupby('Concept')['Amount'].apply(list)

The problem is that with this code I can not group all concepts 'A0X', 'B0X', 'C0X' as the same general concept 'A', 'B' or 'C'.

The final result I need would be:

Total expenses

Concept Amount
A 280
B 140
C 105
D 190

What I need is to know how to implement the groupby method with regex expressions.

Upvotes: 0

Views: 91

Answers (1)

sunnytown
sunnytown

Reputation: 1996

If you want to group by the first letter of the Concept column, you can use the pandas string functions in a groupby.

df_total = df.groupby(df['Concept'].str.split(' ').str[0]).sum().reset_index()

This returns:

>>> df_total
    Concept Amount
0   A       -280
1   B       -140
2   C       -105
3   D       -190

If you want the absolute value of the Amount column:

df_total['Amount'] = df_total['Amount'].abs()

Upvotes: 1

Related Questions