spidermarn
spidermarn

Reputation: 939

Python: How to label this dataset

Hi I have df like this:

df = pd.DataFrame({'CaseNo':[1,1,1,1,2,2,3,3,3,4,4],
                   'Category':['A','A','IOU','A','B','B','IOU','IOU','IOU','C','IOU']})

enter image description here

I'd like to label CaseNo col based on this condition:
For each CaseNo, if there is an occurrence of value 'IOU' in Category column with any values (except 'IOU') preceding it - label 'YES', else label 'NO'.

How can I do this in Python? Expected outcome: enter image description here

Upvotes: 2

Views: 1413

Answers (4)

yatu
yatu

Reputation: 88226

IIUC and assuming that YES should only be set when there's at least one IOU and some value other than IOU preceding it:

m = (df.Category.eq('IOU').view('i1')
       .groupby(df.CaseNo)
       .transform(lambda x: x.diff().eq(1).any()))
df['Label'] = m.map({True:'Yes',False:'No'})

print(df)

    CaseNo Category Label
0        1        A   Yes
1        1        A   Yes
2        1      IOU   Yes
3        1        A   Yes
4        2        B    No
5        2        B    No
6        3      IOU    No
7        3      IOU    No
8        3      IOU    No
9        4        C   Yes
10       4      IOU   Yes

The idea here is that Series.diff() will be 1, when there is a 1 (IOU) preceded by a 0 (no IOU)

Upvotes: 2

sammywemmy
sammywemmy

Reputation: 28644

Groupby with a count of IOU is one way to solve it :

res = (df
       .assign(val = df.groupby('CaseNo').Category.transform(",".join),
               label = lambda x: np.where(x.val.str.count("IOU").eq(1) ,"YES","NO")
              )
       .drop('val',axis=1)
      )

res


    CaseNo  Category    label
0      1       A        YES
1      1       A        YES
2      1      IOU       YES
3      1       A        YES
4      2       B        NO
5      2       B        NO
6      3     IOU        NO
7      3     IOU        NO
8      3     IOU        NO
9      4       C        YES
10     4     IOU        YES

Upvotes: 1

paradocslover
paradocslover

Reputation: 3294

The following code does what you need. It uses groupby and index. The advantage is you can always modify the function func to make an update in what you want. The other answers are no doubt specific but can't be modified much.

def func(x):
    try:
        return 'YES' if x.index('IOU')!=0 else 'NO'
    except:
        return "NO"
mapper = df.groupby("CaseNo")['Category'].agg(list).apply(func)
df['Label'] = df['CaseNo'].apply(lambda x: mapper[x])
print(df)

Output:

    CaseNo Category Label
0        1        A   YES
1        1        A   YES
2        1      IOU   YES
3        1        A   YES
4        2        B    NO
5        2        B    NO
6        3      IOU    NO
7        3      IOU    NO
8        3      IOU    NO
9        4        C   YES
10       4      IOU   YES

Upvotes: 1

BENY
BENY

Reputation: 323226

Let us try transform with first and any

g=df.Category.eq('IOU').groupby(df['CaseNo'])
m=g.transform('any') & ~g.transform('first')
df['New']=m.map({True:'Yes',False:'No'})
df
Out[24]: 
    CaseNo Category  New
0        1        A  Yes
1        1        A  Yes
2        1      IOU  Yes
3        1        A  Yes
4        2        B   No
5        2        B   No
6        3      IOU   No
7        3      IOU   No
8        3      IOU   No
9        4        C  Yes
10       4      IOU  Yes

Upvotes: 1

Related Questions