Reputation: 157
Currently my dataframe is:
dd = [[1001,'green apple',1,7],[1001,'red apple',1,2],[1001,'grapes',1,5],[1002,'green apple',2,4],[1002,'red apple',2,4],[1003,'red apple',3,8],[1004,'mango',4,2],[1004,'red apple',4,6]]
df = pd.DataFrame(dd, columns = ['colID','colString','custID','colQuantity'])
colID colString custID colQuantity
0 1001 green apple 1 7
1 1001 red apple 1 2
2 1001 grapes 1 5
3 1002 green apple 2 4
4 1002 red apple 2 4
5 1003 red apple 3 8
6 1004 mango 4 2
7 1004 red apple 4 6
Right now I only managed to filter rows that contains red and green apple using the code:
selection = ['green apple','red apple']
mask = df.colString.apply(lambda x: any(item for item in selection if item in x))
df = df[mask]
Current Output:
colID colString custID colQuantity
0 1001 green apple 1 7
1 1001 red apple 1 2
3 1002 green apple 2 4
4 1002 red apple 2 4
5 1003 red apple 3 8
7 1004 red apple 4 6
The final desired output is getting the sum of green apple AND red apple that has the same colID:
colID custID colQuantity
1001 1 9
1002 2 8
Upvotes: 2
Views: 43
Reputation: 88226
You can use isin
to index the dataframe and then groupby.sum
:
(df[df.colString.isin(['green apple', 'red apple'])]
.groupby(['colID','colString'], as_index=False)
.sum())
colID colString colQuantity
0 1001 green apple 7
1 1001 red apple 2
2 1002 green apple 4
3 1002 red apple 4
4 1003 red apple 8
5 1004 red apple 6
Upvotes: 2