Reputation: 29
I have a purchase record table showing the purchasing history recorded by customers and the store. I would like to get a new column where it shows the sum of the customer records & store records from each customer, as a group, equals. So basically 5 + 3 = 5 + 3 so Yes for 001 but 2 != 3 so No for 002.
Date | custID | itmID | cust rec | store rec |
---|---|---|---|---|
0514 | 001 | A | 5 | 5 |
0514 | 001 | B | 3 | 3 |
0514 | 002 | A | 2 | 3 |
expected result:
Date | custID | itmID | cust rec | store rec | sum match |
---|---|---|---|---|---|
0514 | 001 | A | 5 | 5 | Yes |
0514 | 001 | B | 3 | 3 | Yes |
0514 | 002 | A | 2 | 3 | No |
my code:
df['sum match'] = (sum(df['store rec'])==sum(df['cust rec'])).groupby(match.custID)
error shows 'bool' object has no attribute 'groupby'
Upvotes: 1
Views: 765
Reputation: 75080
You can groupby and transform sum and then check difference of the columns on axis=1. then get the last result to check if it is 0 and conditionally assign values to the new column:
c = (df.groupby("custID")[['cust rec','store rec']]
.transform('sum').diff(axis=1).iloc[:,-1].eq(0))
df['sum match'] = np.where(c,"Yes",'No')
output:
print(df)
Date custID itmID cust rec store rec sum match
0 514 1 A 5 5 Yes
1 514 1 B 3 3 Yes
2 514 2 A 2 3 No
More details of the steps:
Groupby + transform sum returns sum of each group per column:
print(df.groupby("custID")[['cust rec','store rec']].transform('sum'))
cust rec store rec
0 8 8
1 8 8
2 2 3
then difference on axis=1:
print(df.groupby("custID")[['cust rec','store rec']].transform('sum').diff(axis=1))
cust rec store rec
0 NaN 0.0
1 NaN 0.0
2 NaN 1.0
Upvotes: 1