pandas dataframe group by max on one column

Question

I'm trying to group by the 'keyword' column and get the characteristic with the larger number of records.

Let's consider the pandas df:

pd.DataFrame([['a', 'A'], ['b', 'A'], ['a', 'B'], ['b', 'B'], ['a', 'A'], ['c', 'B']], columns=['Keywords', 'Char'])

For the keyword a the characteristic A is the most frequent, for the keyword b either A or B are ok, for the keyword c, B is the most frequent.

In my case I have 10000 keywords and 3 characteristics. I want to have as a return a pd.Series with the keyword as an index and the most frequent characteristic as value or a dictionary with keyword as key and the most frequent characteristic as value.

I tried grouping my keywords and characteristics and count the rows as following:

res = frame.groupby(['Keywords', 'Char']).size().reset_index().rename(columns={0:'records'})

But I don't know how to get the characteristic corresponding to the maximum.

Expected output (any of this is ok):

pd.Series(data=['A', 'A', 'B'], index = ['a', 'b', 'c'])

or

pd.Series(data=['A', 'B', 'B'], index = ['a', 'b', 'c'])

or

{'a':'A', 'b':'A', 'c':'B'}

or

{'a':'A', 'b':'B', 'c':'B'}

jezrael · Accepted Answer

Use Series.value_counts in lambda function per groups by Keywords and return first value of index. Idea of solution is use value_counts because sorting values by default by counts:

res = frame.groupby('Keywords')['Char'].apply(lambda x: x.value_counts().index[0])
print (res)
Keywords
a    A
b    B
c    B
Name: Char, dtype: object

pandas dataframe group by max on one column

Answers (1)

Related Questions