Luca Perniè
Luca Perniè

Reputation: 33

Pandas: return the occurrences of the most frequent value for each group (possibly without apply)

Let's assume the input dataset:

test1 = [[0,7,50], [0,3,51], [0,3,45], [1,5,50],[1,0,50],[2,6,50]]
df_test = pd.DataFrame(test1, columns=['A','B','C'])

that corresponds to:

    A   B   C
0   0   7   50
1   0   3   51
2   0   3   45
3   1   5   50
4   1   0   50
5   2   6   50

I would like to obtain the a dataset grouped by 'A', together with the most common value for 'B' in each group, and the occurrences of that value:

A   most_freq freq
0   3          2
1   5          1
2   6          1

I can obtain the first 2 columns with:

grouped = df_test.groupby("A")
out_df = pd.DataFrame(index=grouped.groups.keys())
out_df['most_freq'] = df_test.groupby('A')['B'].apply(lambda x: x.value_counts().idxmax())

but I am having problems the last column. Also: is there a faster way that doesn't involve 'apply'? This solution doesn't scale well with lager inputs (I also tried dask).

Thanks a lot!

Upvotes: 3

Views: 686

Answers (1)

jezrael
jezrael

Reputation: 862511

Use SeriesGroupBy.value_counts which sorting by default, so then add DataFrame.drop_duplicates for top values after Series.reset_index:

df = (df_test.groupby('A')['B']
             .value_counts()
             .rename_axis(['A','most_freq'])
             .reset_index(name='freq')
             .drop_duplicates('A'))
print (df)
   A  most_freq  freq
0  0          3     2
2  1          0     1
4  2          6     1

Upvotes: 3

Related Questions