Reputation: 4100
Sample DF:
ID Name Price Condition Fit_Test
1 Apple 10 Good Super_Fit
2 Apple 10 OK Super_Fit
3 Apple 10 Bad Super_Fit
4 Orange 12 Good Not_Fit
5 Orange 12 OK Not_Fit
6 Banana 15 OK Medium_Fit
7 Banana 15 Bad Medium_Fit
8 Pineapple 25 OK Medium_Fit
9 Pineapple 25 OK Medium_Fit
10 Cherry 30 Bad Medium_Fit
Expected DF:
ID Name Price Condition Fit_Test
1 Apple 10 Good Super_Fit
2 Apple 10 OK Super_Fit
3 Apple 10 Bad Super_Fit
4 Orange 12 Good Not_Fit
6 Banana 15 OK Medium_Fit
8 Pineapple 25 OK Medium_Fit
9 Pineapple 25 OK Medium_Fit
10 Cherry 30 Bad Medium_Fit
Problem Statement:
I want to group-by
by Name
and Price
and then filter based on Condition
.
If within a Name
and Price
All 3 conditions of Good, Bad and OK exist then Keep only Good one and Fit_Test is not Super_Fit
If within a Name and Price conditions of Good and OK exist then Keep only Good one (Id 4,5 is only ID 4 in expected) and Fit_Test is not Super_Fit
If within a Name
and Price
conditions of Bad and OK exist then Keep only OK one (Id 6,7 is only ID 6 in expected) and Fit_Test is not Super_Fit
If within a Name
and Price
conditions of OK and OK exist, Good and Good Exist or just Bad exist then don't do anything then Keep only OK one (Id 8,9,10 is ID 8,9,10 in expected) and Fit_Test is not Super_Fit
Update on Answer(s)
df
where you don't have a condition of Fit_Test
column. In this answer the Expected DF will not have rows 2 & 3 as also shown in the answerFit_Test
and should only work when the value is not Super_Fit
.In both the solutions the Filtering of rows based on Condition
column and group by of 2 columns is same.
I found things with filter + group by on the numeric columns but none on the String columns.
Upvotes: 3
Views: 82
Reputation: 863226
Idea is create set
s for comparing:
a = df.join(df.groupby(['Price','Name'])['Condition'].apply(set).rename('m'),
on=['Price','Name'])['m']
print (a)
0 {Bad, Good, OK}
1 {Bad, Good, OK}
2 {Bad, Good, OK}
3 {Good, OK}
4 {Good, OK}
5 {Bad, OK}
6 {Bad, OK}
7 {OK}
8 {OK}
9 {Bad}
Name: m, dtype: object
m1 = (a == set({'Bad', 'Good', 'OK'})) | (a == set({'Good', 'OK'}))
m2 = a == set({'Bad', 'OK'})
#check if unique value - length of set is 1
m3 = a.str.len() == 1
m4 = df['Condition'] == 'Good'
m5 = df['Condition'] == 'OK'
df = df[(m1 & m4) | (m2 & m5) | m3]
print (df)
ID Name Price Condition
0 1 Apple 10 Good
3 4 Orange 12 Good
5 6 Banana 15 OK
7 8 Pineapple 25 OK
8 9 Pineapple 25 OK
9 10 Cherry 30 Bad
EDIT for testing:
For testing is possible use assign
:
print (df.assign(sets=a, m1 = m1, m2=m2, m3=m3, m4=m4, m5=m5, m=m))
ID Name Price Condition sets m1 m2 m3 \
0 1 Apple 10 Good {Bad, Good, OK} True False False
1 2 Apple 10 OK {Bad, Good, OK} True False False
2 3 Apple 10 Bad {Bad, Good, OK} True False False
3 4 Orange 12 Good {Good, OK} True False False
4 5 Orange 12 OK {Good, OK} True False False
5 6 Banana 15 OK {Bad, OK} False True False
6 7 Banana 15 Bad {Bad, OK} False True False
7 8 Pineapple 25 OK {OK} False False True
8 9 Pineapple 25 OK {OK} False False True
9 10 Cherry 30 Bad {Bad} False False True
m4 m5 m
0 True False True
1 False True False
2 False False False
3 True False True
4 False True False
5 False True True
6 False False False
7 False True True
8 False True True
9 False False True
EDIT for update:
For new condition use:
m6 = df['Fit_Test'] == 'Super_Fit'
df = df[((m1 & m4) | (m2 & m5) | m3) | m6]
print (df)
ID Name Price Condition Fit_Test
0 1 Apple 10 Good Super_Fit
1 2 Apple 10 OK Super_Fit
2 3 Apple 10 Bad Super_Fit
3 4 Orange 12 Good Not_Fit
5 6 Banana 15 OK Medium_Fit
7 8 Pineapple 25 OK Medium_Fit
8 9 Pineapple 25 OK Medium_Fit
9 10 Cherry 30 Bad Medium_Fit
Upvotes: 2
Reputation: 7510
If you don't have too many conditions (like here it is only 3), the following is an easy workaround:
df.loc[df["Condition"] == 'Good',"Condition"] = 3
df.loc[df["Condition"] == 'OK',"Condition"] = 2
df.loc[df["Condition"] == 'Bad',"Condition"] = 1
df = df.groupby(['Name','Price']).max()
df.loc[df["Condition"] == 3] = "Good"
df.loc[df["Condition"] == 2] = "OK"
df.loc[df["Condition"] == 1] = "Bad"
Upvotes: 0