Reputation: 516
I have an excel file, with expenses that I need to group by the same concept. For simplicity, the dataframe would be something like this:
Date | Concept | Amount |
---|---|---|
07-12-2021 | A 01 | -100 |
01-12-2021 | A 02 | -150 |
30-11-2021 | B 01 | -50 |
29-11-2021 | C 01 | -25 |
23-11-2021 | D 01 | -70 |
22-11-2021 | C 02 | -80 |
19-11-2021 | B 02 | -65 |
18-11-2021 | B 03 | -25 |
17-11-2021 | A 03 | -30 |
16-11-2021 | D 02 | -120 |
The concept 'A 0X', refer to the same general concept 'A' but it has different identifiers. Bear in mind that concept 'A 0X' is for the sake of simplicity, but the concept will be a more complex expression
I am trying using this code to group by the same concept:
import pandas as pd
df = pd.read_excel("Expenses.xlsx")
df.groupby('Concept')['Amount'].apply(list)
The problem is that with this code I can not group all concepts 'A0X', 'B0X', 'C0X' as the same general concept 'A', 'B' or 'C'.
The final result I need would be:
Total expenses
Concept | Amount |
---|---|
A | 280 |
B | 140 |
C | 105 |
D | 190 |
What I need is to know how to implement the groupby method with regex expressions.
Upvotes: 0
Views: 91
Reputation: 1996
If you want to group by the first letter of the Concept column, you can use the pandas string functions in a groupby.
df_total = df.groupby(df['Concept'].str.split(' ').str[0]).sum().reset_index()
This returns:
>>> df_total
Concept Amount
0 A -280
1 B -140
2 C -105
3 D -190
If you want the absolute value of the Amount column:
df_total['Amount'] = df_total['Amount'].abs()
Upvotes: 1