Pandas groupy strings with conditions

Question

data = {'Item_No':['001', '001', '002','002','002','003','003'], 
        'kitting':['no','yes', 'no', 'yes', 'no','no','no']} 
df = pd.DataFrame(data)

I would like to group by 'Item_No'. Change Kitting column to 'yes' for any Item_no that has at least 1 'yes' otherwise 'no'. And a new Column to get % of total. like below...

Item_No     Kitting          %kitting 
    001      yes               50%  
    002      yes               33%
    003       no                0%

jezrael · Accepted Answer

First idea is replace non yes values to NaN in helper column in DataFrame.assign and use GroupBy.agg for first non missing value and for percentage is used mean in lambda function with multiple 100:

m = df['kitting'].eq('yes')
df = (df.assign(m = m, a = df['kitting'].where(m))
        .groupby('Item_No', as_index=False)
        .agg(Kitting = ('a','first'),
             perc = ('m', lambda x: x.mean() * 100))
        .fillna('no'))
print (df)
  Item_No Kitting       perc
0     001     yes  50.000000
1     002     yes  33.333333
2     003      no   0.000000

Another idea is use ordered categoricals and aggregate min first:

s = pd.Categorical(df['kitting'], ordered=True, categories=['yes','no'])
df = (df.assign(m = s == 'yes', a = s)
        .groupby('Item_No', as_index=False)
        .agg(Kitting = ('a','min'),
             perc = ('m', lambda x: x.mean() * 100)))
print (df)
  Item_No Kitting       perc
0     001     yes  50.000000
1     002     yes  33.333333
2     003      no   0.000000

Pandas groupy strings with conditions

Answers (2)

Related Questions