Messak
Messak

Reputation: 463

Pandas calculate common occurrence percentage of a group by

I would think this is a simple task, but have had a difficult time getting it working. I have a pandas dataframe

IDX     Name    Fruit
 0      Mary    Grapes
 1      Tom     Apple
 2      Mary    Grapes
 3      Mary    Apple
 4      Tom     Apple
 5      Mary    Grapes
 6      Tom     Grapes
 7      Tom     Apple

My expected result

IDX     Name    Occurrence     Perc
 0      Mary    Grapes         75%
 1      Tom     Apple          75%

So my end result would be a data frame grouped by name, with the most common occurrence for fruit and the percentage of that occurrence

So to get the most common occurrence i use

df["Occurrence"] = (df.groupby("Name")["Fruit"].transform(lambda x: x.value_counts().index[0])) 

How would i go about populating the perc column?

Upvotes: 0

Views: 558

Answers (3)

rhug123
rhug123

Reputation: 8768

Here is another way.

df['pcts'] = df.groupby(['name','fruit'])['fruit'].transform('size').div(df['fruit'].groupby(df['name']).transform('size'))

df.loc[df.groupby('name')['pcts'].idxmax()]

Upvotes: 1

jezrael
jezrael

Reputation: 862511

Use Series.value_counts with normalize=True in lambda function and return first row by DataFrame.head, last convert Multiindex Series to DataFrame:

df1 = (df.groupby("Name")["Fruit"]
         .apply(lambda x: x.value_counts(normalize=True).head(1))
         .mul(100)
         .rename_axis(['Name','Occurrence'])
         .reset_index(name='Perc'))
print (df1)
   Name Occurrence  Perc
0  Mary     Grapes  75.0
1   Tom      Apple  75.0

Upvotes: 1

BENY
BENY

Reputation: 323226

Let us do crosstab then sort_values with tail

out = pd.crosstab(df.Name,df.Fruit,normalize='index').stack().sort_values().groupby(level=0).tail(1)
Out[65]: 
Name  Fruit 
Mary  Grapes    0.75
Tom   Apple     0.75
dtype: float64

Upvotes: 1

Related Questions