Jered
Jered

Reputation: 226

How to sum up values in a dataframe and add them to another one?

I have two dataframes, one for individual transactions and another for the chart of accounts.

I'm trying to sum up all transactions for the last month (in this case, March) for each CompanyKey. I then want to add this result as a new column to the chart of accounts dataframe with the CompanyKey as the column header.

Here is a small sample of the transaction data (In reality there are thousands of transactions):

import pandas as pd

df = pd.DataFrame({
        'CompanyKey': ["1","1","1","1","1","1","1","2","2","2"],
        'DateOccurred': ["31/12/2021","25/02/2022","15/03/2022","31/03/2022","31/12/2021","22/02/2022","16/03/2022","31/12/2021","25/02/2022","31/03/2022"],
        'Account.Name': ["Cash at Bank","Cash at Bank","Cash at Bank","Cash at Bank","GST Paid","GST Paid","GST Paid","Cash at Bank","Cash at Bank","Cash at Bank"],
        'Amount': [150,112200,234065,19167.08,-39080.03,-10200,-27.5,15000,-234567,340697]})

Here are the corresponding chart of accounts:

df1 = pd.DataFrame({
            'ConsolidatedAccountName': ["Cash at Bank","GST Paid", "Cash at Bank", "GST Paid"],
            'Level 1': ["Fund Statement","Fund Statement", "Cash Flow Statement", "Cash Flow Statement"],
            'Level 2': ["Cash at Bank","GST Paid", "Cash at Bank", "GST Paid"]})

This is my desired result. I only want the Sums to be applied to rows which have a df['Level 1'] == "Fund Statement".

+──────────────────────────+──────────────────────+───────────────+────────────────+────────────────+
| ConsolidatedAccountName  | Level 1              | Level 2       | Company 1 Sum  | Company 2 Sum  |
+──────────────────────────+──────────────────────+───────────────+────────────────+────────────────+
| Cash at Bank             | Fund Statement       | Cash at Bank  | 253,232.08     | 340,697        |
| GST Paid                 | Fund Statement       | GST Paid      | -27.50         | 0              |
| Cash at Bank             | Cash Flow Statement  | Cash at Bank  | NaN            | NaN            |
| GST Paid                 | Cash Flow Statement  | GST Paid      | NaN            | NaN            |
+──────────────────────────+──────────────────────+───────────────+────────────────+────────────────+

This is about as far as I got before running into issues.

company_keys = [1, 2]
    
for company in company_keys:
    d1['Company 1 Sum'] = np.where((d3['CompanyKey'] == company) &
                                       (d3['DateOccurred'] >= '01/03/2022') & 
                                       (d3['DateOccurred'] <= '31/03/2022') &
                                       (d1['Level 1'] == 'Fund Statement'),
                                        d3['Amount'].sum(),
                                        0)

This is the error I get.

    ValueError: Length of values (10) does not match length of index (4)

Upvotes: 2

Views: 139

Answers (1)

Laurent
Laurent

Reputation: 13528

Here is one way to do it with Pandas groupby and apply:

# Setup
df["DateOccurred"] = pd.to_datetime(df["DateOccurred"], format="%d/%m/%Y")

# Sum transactions per companies and accounts
df_sum = (
    df.loc[df["DateOccurred"].dt.month == 3, :]
    .groupby(["CompanyKey", "Account.Name"])
    .agg({"Amount": sum})
)

# Add new columns
for idx in df["CompanyKey"].unique():
    df1[f"Company {idx} Sum"] = df1.apply(
        lambda x: df_sum.loc[(idx, x["ConsolidatedAccountName"]), "Amount"]
        if (x["ConsolidatedAccountName"] in df_sum.loc[(idx), :].index.unique())
        and (x["Level 1"] == "Fund Statement")
        else None,
        axis=1,
    )

# Cleanup
df1.loc[df1["Level 1"] == "Fund Statement"] = df1.loc[
    df1["Level 1"] == "Fund Statement"
].fillna(0)

Then:

print(df1)
# Output

ConsolidatedAccountName              Level 1       Level 2  Company 1 Sum   Company 2 Sum
0          Cash at Bank       Fund Statement  Cash at Bank      253232.08        340697.0
1              GST Paid       Fund Statement      GST Paid         -27.50             0.0
2          Cash at Bank  Cash Flow Statement  Cash at Bank            NaN             NaN
3              GST Paid  Cash Flow Statement      GST Paid            NaN             NaN

Upvotes: 1

Related Questions