Reputation: 939
Hi I have df like this:
df = pd.DataFrame({'CaseNo':[1,1,1,1,2,2,3,3,3,4,4],
'Category':['A','A','IOU','A','B','B','IOU','IOU','IOU','C','IOU']})
I'd like to label CaseNo
col based on this condition:
For each CaseNo
, if there is an occurrence of value 'IOU' in Category
column with any values (except 'IOU') preceding it - label 'YES', else label 'NO'.
How can I do this in Python? Expected outcome:
Upvotes: 2
Views: 1413
Reputation: 88226
IIUC and assuming that YES
should only be set when there's at least one IOU
and some value other than IOU
preceding it:
m = (df.Category.eq('IOU').view('i1')
.groupby(df.CaseNo)
.transform(lambda x: x.diff().eq(1).any()))
df['Label'] = m.map({True:'Yes',False:'No'})
print(df)
CaseNo Category Label
0 1 A Yes
1 1 A Yes
2 1 IOU Yes
3 1 A Yes
4 2 B No
5 2 B No
6 3 IOU No
7 3 IOU No
8 3 IOU No
9 4 C Yes
10 4 IOU Yes
The idea here is that Series.diff()
will be 1, when there is a 1 (IOU
) preceded by a 0
(no IOU
)
Upvotes: 2
Reputation: 28644
Groupby
with a count of IOU
is one way to solve it :
res = (df
.assign(val = df.groupby('CaseNo').Category.transform(",".join),
label = lambda x: np.where(x.val.str.count("IOU").eq(1) ,"YES","NO")
)
.drop('val',axis=1)
)
res
CaseNo Category label
0 1 A YES
1 1 A YES
2 1 IOU YES
3 1 A YES
4 2 B NO
5 2 B NO
6 3 IOU NO
7 3 IOU NO
8 3 IOU NO
9 4 C YES
10 4 IOU YES
Upvotes: 1
Reputation: 3294
The following code does what you need. It uses groupby
and index
. The advantage is you can always modify the function func
to make an update in what you want. The other answers are no doubt specific but can't be modified much.
def func(x):
try:
return 'YES' if x.index('IOU')!=0 else 'NO'
except:
return "NO"
mapper = df.groupby("CaseNo")['Category'].agg(list).apply(func)
df['Label'] = df['CaseNo'].apply(lambda x: mapper[x])
print(df)
Output:
CaseNo Category Label
0 1 A YES
1 1 A YES
2 1 IOU YES
3 1 A YES
4 2 B NO
5 2 B NO
6 3 IOU NO
7 3 IOU NO
8 3 IOU NO
9 4 C YES
10 4 IOU YES
Upvotes: 1
Reputation: 323226
Let us try transform
with first
and any
g=df.Category.eq('IOU').groupby(df['CaseNo'])
m=g.transform('any') & ~g.transform('first')
df['New']=m.map({True:'Yes',False:'No'})
df
Out[24]:
CaseNo Category New
0 1 A Yes
1 1 A Yes
2 1 IOU Yes
3 1 A Yes
4 2 B No
5 2 B No
6 3 IOU No
7 3 IOU No
8 3 IOU No
9 4 C Yes
10 4 IOU Yes
Upvotes: 1