Reputation: 11
I'm trying to concatenate some conditions but it's getting quite hard to solve it, here's the problem:
I have a df with 3 columns:
| Column1 | Column2 | Column3 |
|:-------:|:-------:|:-------:|
| group1 | A | x |
| group1 | B | p |
| group1 | C | y |
| group2 | A | x |
| group2 | B | j |
| group2 | C | y |
| group3 | A | x |
| group3 | B | p |
| group3 | C | z |
| group4 | A | h |
| group4 | B | p |
| group4 | C | y |
So, what I'm trying to do is to check if for each group of values all the conditions are met. If one of them doesn't, all the group should be "KO". Conditions: -For groupx
The output should be like this:
| Column1 | Column2 | Column3 | Column4 | Column5 |
|:-------:|:-------:|:-------:|:-------:|:-------:|
| group1 | A | x | OK | OK |
| group1 | B | p | OK | OK |
| group1 | C | y | OK | OK |
| group2 | A | x | OK | KO |
| group2 | B | j | KO | KO |
| group2 | C | y | OK | KO |
| group3 | A | x | OK | KO |
| group3 | B | p | OK | KO |
| group3 | C | z | KO | KO |
| group4 | A | h | OK | OK |
| group4 | B | p | OK | OK |
| group4 | C | y | OK | OK |
In column4 we have result for each row of the df and in column5 we check that or each group all the conditions are satisfied.
Thanks a lot for the help!
Upvotes: 1
Views: 77
Reputation: 377
A solution without for
loops:
import numpy as np
import pandas as pd
df = pd.DataFrame({"Column1":[1,1,1,2,2,2,3,3,3,4,4,4],
"Column2":["A","B","C","A","B","C","A","B","C","A","B","C"],
"Column3":["x","p","y","x","j","y","x","p","z","h","p","y"]})
def check(x, y):
conditions = {"A": "x|h", "B": "p", "C": "y"}
if y in conditions[x]:
return "OK"
else:
return "KO"
df["Column4"] = df.apply(lambda row: check(row["Column2"], row["Column3"]), axis=1)
df["Column5"] = np.where(df.Column4.eq("OK").groupby(df.Column1).sum().eq(3).repeat(3), "OK", "KO")
print(df)
Out:
Column1 Column2 Column3 Column4 Column5
0 1 A x OK OK
1 1 B p OK OK
2 1 C y OK OK
3 2 A x OK KO
4 2 B j KO KO
5 2 C y OK KO
6 3 A x OK KO
7 3 B p OK KO
8 3 C z KO KO
9 4 A h OK OK
10 4 B p OK OK
11 4 C y OK OK
Upvotes: 0
Reputation: 1986
Try this... A simple function called in apply-lambda then for loop to get the desired results... Not sure this is the optimal way though...
df = pd.DataFrame({"Column1":[1,1,1,2,2,2,3,3,3,4,4,4],
"Column2":["A","B","C","A","B","C","A","B","C","A","B","C"],
"Column3":["x","p","y","x","j","y","x","p","z","h","p","y"]})
# Sorting data on column1;
df = df.sort_values('Column1')
df.index = range(0,df.shape[0])
# Column4
def fun(x,y):
if (x=='A' and y=='x') or (x=='A' and y=='h') or (x=='B' and y=='p') or (x=='C' and y=='y'):
return "OK"
else:
return "KO"
df["Column4"] = df.apply(lambda x: fun(x["Column2"], x["Column3"]), axis=1)
# Column5
lst = []
for i in df["Column1"].unique():
df_temp = df[df["Column1"] == i]
if "KO" in list(df_temp["Column4"]):
lst = lst + ["KO"]*df_temp.shape[0]
else:
lst = lst + ["OK"]*df_temp.shape[0]
df["Column5"] = lst
Hope this Helps...
# Output of df
Column1 Column2 Column3 Column4 Column5
0 1 A x OK OK
1 1 B p OK OK
2 1 C y OK OK
3 2 A x OK KO
4 2 B j KO KO
5 2 C y OK KO
6 3 A x OK KO
7 3 B p OK KO
8 3 C z KO KO
9 4 A h OK OK
10 4 B p OK OK
11 4 C y OK OK
Upvotes: 0
Reputation: 1683
Actually my answer is pretty similar to that @Sachin Koli's, but we took a different approach for Column5
import pandas as pd
df = pd.DataFrame({
'Column1':['group1', 'group1','group1','group2','group2','group2','group3','group3','group3','group4','group4','group4'],
'Column2':['A', 'B','C','A','B','C','A','B','C','A','B','C'],
'Column3':['x', 'p','y','x','j','y','x','p','z','h','p','y']
}, columns=['Column1', 'Column2', 'Column3', 'Column4', 'Column5'])
def helper(i, j):
if (i=='A' and (j=='x' or j=='h')) or (i=='B' and j=='p') or (i=='C' and j=='y'):
return 'OK'
return 'KO'
df['Column4'] = df.apply(lambda x: helper(x[1], x[2]), axis=1)
for idx in df.index:
grp = df['Column1'].iloc[idx]
df['Column5'].iloc[idx] = 'OK' if all(df[df['Column1']==grp]['Column4'] == 'OK') else 'KO'
UPDATE
You could use a list comprehension for Column5
df['Column5'] = ['OK' if all(df[df['Column1']==grp]['Column4'] == 'OK') else 'KO' for grp in df['Column1']]
Upvotes: 1