Reputation: 463
I would think this is a simple task, but have had a difficult time getting it working. I have a pandas dataframe
IDX Name Fruit
0 Mary Grapes
1 Tom Apple
2 Mary Grapes
3 Mary Apple
4 Tom Apple
5 Mary Grapes
6 Tom Grapes
7 Tom Apple
My expected result
IDX Name Occurrence Perc
0 Mary Grapes 75%
1 Tom Apple 75%
So my end result would be a data frame grouped by name, with the most common occurrence for fruit and the percentage of that occurrence
So to get the most common occurrence i use
df["Occurrence"] = (df.groupby("Name")["Fruit"].transform(lambda x: x.value_counts().index[0]))
How would i go about populating the perc column?
Upvotes: 0
Views: 558
Reputation: 8768
Here is another way.
df['pcts'] = df.groupby(['name','fruit'])['fruit'].transform('size').div(df['fruit'].groupby(df['name']).transform('size'))
df.loc[df.groupby('name')['pcts'].idxmax()]
Upvotes: 1
Reputation: 862511
Use Series.value_counts
with normalize=True
in lambda function and return first row by DataFrame.head
, last convert Multiindex Series
to DataFrame
:
df1 = (df.groupby("Name")["Fruit"]
.apply(lambda x: x.value_counts(normalize=True).head(1))
.mul(100)
.rename_axis(['Name','Occurrence'])
.reset_index(name='Perc'))
print (df1)
Name Occurrence Perc
0 Mary Grapes 75.0
1 Tom Apple 75.0
Upvotes: 1
Reputation: 323226
Let us do crosstab
then sort_values
with tail
out = pd.crosstab(df.Name,df.Fruit,normalize='index').stack().sort_values().groupby(level=0).tail(1)
Out[65]:
Name Fruit
Mary Grapes 0.75
Tom Apple 0.75
dtype: float64
Upvotes: 1