Reputation: 226
I have two dataframes, one for individual transactions and another for the chart of accounts.
I'm trying to sum up all transactions for the last month (in this case, March) for each CompanyKey. I then want to add this result as a new column to the chart of accounts dataframe with the CompanyKey as the column header.
Here is a small sample of the transaction data (In reality there are thousands of transactions):
import pandas as pd
df = pd.DataFrame({
'CompanyKey': ["1","1","1","1","1","1","1","2","2","2"],
'DateOccurred': ["31/12/2021","25/02/2022","15/03/2022","31/03/2022","31/12/2021","22/02/2022","16/03/2022","31/12/2021","25/02/2022","31/03/2022"],
'Account.Name': ["Cash at Bank","Cash at Bank","Cash at Bank","Cash at Bank","GST Paid","GST Paid","GST Paid","Cash at Bank","Cash at Bank","Cash at Bank"],
'Amount': [150,112200,234065,19167.08,-39080.03,-10200,-27.5,15000,-234567,340697]})
Here are the corresponding chart of accounts:
df1 = pd.DataFrame({
'ConsolidatedAccountName': ["Cash at Bank","GST Paid", "Cash at Bank", "GST Paid"],
'Level 1': ["Fund Statement","Fund Statement", "Cash Flow Statement", "Cash Flow Statement"],
'Level 2': ["Cash at Bank","GST Paid", "Cash at Bank", "GST Paid"]})
This is my desired result. I only want the Sums to be applied to rows which have a df['Level 1'] == "Fund Statement"
.
+──────────────────────────+──────────────────────+───────────────+────────────────+────────────────+
| ConsolidatedAccountName | Level 1 | Level 2 | Company 1 Sum | Company 2 Sum |
+──────────────────────────+──────────────────────+───────────────+────────────────+────────────────+
| Cash at Bank | Fund Statement | Cash at Bank | 253,232.08 | 340,697 |
| GST Paid | Fund Statement | GST Paid | -27.50 | 0 |
| Cash at Bank | Cash Flow Statement | Cash at Bank | NaN | NaN |
| GST Paid | Cash Flow Statement | GST Paid | NaN | NaN |
+──────────────────────────+──────────────────────+───────────────+────────────────+────────────────+
This is about as far as I got before running into issues.
company_keys = [1, 2]
for company in company_keys:
d1['Company 1 Sum'] = np.where((d3['CompanyKey'] == company) &
(d3['DateOccurred'] >= '01/03/2022') &
(d3['DateOccurred'] <= '31/03/2022') &
(d1['Level 1'] == 'Fund Statement'),
d3['Amount'].sum(),
0)
This is the error I get.
ValueError: Length of values (10) does not match length of index (4)
Upvotes: 2
Views: 139
Reputation: 13528
Here is one way to do it with Pandas groupby and apply:
# Setup
df["DateOccurred"] = pd.to_datetime(df["DateOccurred"], format="%d/%m/%Y")
# Sum transactions per companies and accounts
df_sum = (
df.loc[df["DateOccurred"].dt.month == 3, :]
.groupby(["CompanyKey", "Account.Name"])
.agg({"Amount": sum})
)
# Add new columns
for idx in df["CompanyKey"].unique():
df1[f"Company {idx} Sum"] = df1.apply(
lambda x: df_sum.loc[(idx, x["ConsolidatedAccountName"]), "Amount"]
if (x["ConsolidatedAccountName"] in df_sum.loc[(idx), :].index.unique())
and (x["Level 1"] == "Fund Statement")
else None,
axis=1,
)
# Cleanup
df1.loc[df1["Level 1"] == "Fund Statement"] = df1.loc[
df1["Level 1"] == "Fund Statement"
].fillna(0)
Then:
print(df1)
# Output
ConsolidatedAccountName Level 1 Level 2 Company 1 Sum Company 2 Sum
0 Cash at Bank Fund Statement Cash at Bank 253232.08 340697.0
1 GST Paid Fund Statement GST Paid -27.50 0.0
2 Cash at Bank Cash Flow Statement Cash at Bank NaN NaN
3 GST Paid Cash Flow Statement GST Paid NaN NaN
Upvotes: 1