heaton124
heaton124

Reputation: 79

Python: Sum values in DataFrame if other values match between DataFrames

I have two dataframes of different length like those:

DataFrame A:

FirstName    LastName
Adam         Smith
John         Johnson

DataFrame B:

First        Last        Value
Adam         Smith       1.2
Adam         Smith       1.5
Adam         Smith       3.0
John         Johnson     2.5

Imagine that what I want to do is to create a new column in "DataFrame A" summing all the values with matching last names, so the output in "A" would be:

FirstName    LastName    Sums
Adam         Smith       5.7
John         Johnson     2.5

If I were in Excel, I'd use

=SUMIF(dfB!B:B, B2, dfB!C:C)

In Python I've been trying multiple solutions but using both np.where, df.sum(), dropping indexes etc., but I'm lost. Below code is returning "ValueError: Can only compare identically-labeled Series objects", but I don't think it's written correctly anyways.

df_a['Sums'] = df_a[df_a['LastName'] == df_b['Last']].sum()['Value']

Huge thanks in advance for any help.

Upvotes: 2

Views: 1387

Answers (3)

ansev
ansev

Reputation: 30920

Use DataFrame.merge + DataFrame.groupby:

new_df=( dfa.merge(dfb.groupby(['First','Last'],as_index=False).Value.sum() ,
                   left_on='LastName',right_on='Last',how='left')
            .drop('Last',axis=1) )
print(new_df)

to join for both columns:

new_df=( dfa.merge(dfb.groupby(['First','Last'],as_index=False).Value.sum() ,
              left_on=['FirstName','LastName'],right_on=['First','Last'],how='left')
            .drop(['First','Last'],axis=1) )
print(new_df)

Output:

  FirstName LastName  Value
0      Adam    Smith    5.7
1      John  Johnson    2.5

Upvotes: 1

Sreeram TP
Sreeram TP

Reputation: 11917

df_b_a = (pd.merge(df_b, df_a, left_on=['FirstName', 'LastName'], right_on=['First', 'Last'], how='left')
                .groupby(by=['First', 'Last'], as_index=False)['Value'].sum())

print(df_b_a)

    First   Last    Value
0   Adam    Smith   5.7
1   John    Johnson     2.5

Upvotes: 1

jezrael
jezrael

Reputation: 862731

Use boolean indexing with Series.isin for filtering and then aggregate sum:

df = (df_b[df_b['Last'].isin(df_a['LastName'])]
           .groupby(['First','Last'], as_index=False)['Value']
           .sum())

If want match both, first and last name:

df = (df_b.merge(df_a, left_on=['First','Last'], right_on=['FirstName','LastName'])
           .groupby(['First','Last'], as_index=False)['Value']
           .sum())

Upvotes: 2

Related Questions