Leo Yang
Leo Yang

Reputation: 29

Pandas check if the sum of two columns within the same group equals

I have a purchase record table showing the purchasing history recorded by customers and the store. I would like to get a new column where it shows the sum of the customer records & store records from each customer, as a group, equals. So basically 5 + 3 = 5 + 3 so Yes for 001 but 2 != 3 so No for 002.

Date custID itmID cust rec store rec
0514 001 A 5 5
0514 001 B 3 3
0514 002 A 2 3

expected result:

Date custID itmID cust rec store rec sum match
0514 001 A 5 5 Yes
0514 001 B 3 3 Yes
0514 002 A 2 3 No

my code:

df['sum match'] = (sum(df['store rec'])==sum(df['cust rec'])).groupby(match.custID)

error shows 'bool' object has no attribute 'groupby'

Upvotes: 1

Views: 765

Answers (1)

anky
anky

Reputation: 75080

You can groupby and transform sum and then check difference of the columns on axis=1. then get the last result to check if it is 0 and conditionally assign values to the new column:

c = (df.groupby("custID")[['cust rec','store rec']]
      .transform('sum').diff(axis=1).iloc[:,-1].eq(0))

df['sum match'] = np.where(c,"Yes",'No')

output:

print(df)

   Date  custID itmID  cust rec  store rec sum match
0   514       1     A         5          5       Yes
1   514       1     B         3          3       Yes
2   514       2     A         2          3        No

More details of the steps:

Groupby + transform sum returns sum of each group per column:

print(df.groupby("custID")[['cust rec','store rec']].transform('sum'))

   cust rec  store rec
0         8          8
1         8          8
2         2          3

then difference on axis=1:

print(df.groupby("custID")[['cust rec','store rec']].transform('sum').diff(axis=1))
   cust rec  store rec
0       NaN        0.0
1       NaN        0.0
2       NaN        1.0

Upvotes: 1

Related Questions