Reputation: 177
data = {'Item_No':['001', '001', '002','002','002','003','003'],
'kitting':['no','yes', 'no', 'yes', 'no','no','no']}
df = pd.DataFrame(data)
I would like to group by 'Item_No'. Change Kitting column to 'yes' for any Item_no that has at least 1 'yes' otherwise 'no'. And a new Column to get % of total. like below...
Item_No Kitting %kitting
001 yes 50%
002 yes 33%
003 no 0%
Upvotes: 1
Views: 78
Reputation: 20659
We can use GroupBy.value_counts
with normalize
set to True
. Then use Series.reindex
to extract where level 1 has 'yes'
. If prec
is 0 then no yes
is present.
out = df.groupby("Item_No")["kitting"].value_counts(normalize=True)
idx = pd.MultiIndex.from_product((df["Item_No"].unique(), ["yes", "no"]))
out = out.reindex(idx, fill_value=0).loc[:, "yes"].to_frame("prec")
out = (
out.assign(kitting=np.where(out["prec"].eq(0), "no", "yes"))
.rename_axis("Item_No")
.reset_index()
)
print(out)
Item_No prec kitting
0 001 0.500000 yes
1 002 0.333333 yes
2 003 0.000000 no
Another option (thanks to jezrael)
out = (
df.groupby("Item_No")["kitting"]
.value_counts(normalize=True)
.unstack(fill_value=0)["yes"]
.to_frame("prec")
)
out.assign(kitting=np.where(out["prec"].eq(0), "no", "yes")).reset_index()
Item_No prec kitting
0 001 0.500000 yes
1 002 0.333333 yes
2 003 0.000000 no
Upvotes: 1
Reputation: 863176
First idea is replace non yes
values to NaN
in helper column in DataFrame.assign
and use GroupBy.agg
for first non missing value and for percentage is used mean
in lambda function with multiple 100
:
m = df['kitting'].eq('yes')
df = (df.assign(m = m, a = df['kitting'].where(m))
.groupby('Item_No', as_index=False)
.agg(Kitting = ('a','first'),
perc = ('m', lambda x: x.mean() * 100))
.fillna('no'))
print (df)
Item_No Kitting perc
0 001 yes 50.000000
1 002 yes 33.333333
2 003 no 0.000000
Another idea is use ordered categoricals
and aggregate min
first:
s = pd.Categorical(df['kitting'], ordered=True, categories=['yes','no'])
df = (df.assign(m = s == 'yes', a = s)
.groupby('Item_No', as_index=False)
.agg(Kitting = ('a','min'),
perc = ('m', lambda x: x.mean() * 100)))
print (df)
Item_No Kitting perc
0 001 yes 50.000000
1 002 yes 33.333333
2 003 no 0.000000
Upvotes: 3