Reputation: 323
I'm trying to group by the 'keyword' column and get the characteristic with the larger number of records.
Let's consider the pandas df:
pd.DataFrame([['a', 'A'], ['b', 'A'], ['a', 'B'], ['b', 'B'], ['a', 'A'], ['c', 'B']], columns=['Keywords', 'Char'])
For the keyword a the characteristic A is the most frequent, for the keyword b either A or B are ok, for the keyword c, B is the most frequent.
In my case I have 10000 keywords and 3 characteristics. I want to have as a return a pd.Series with the keyword as an index and the most frequent characteristic as value or a dictionary with keyword as key and the most frequent characteristic as value.
I tried grouping my keywords and characteristics and count the rows as following:
res = frame.groupby(['Keywords', 'Char']).size().reset_index().rename(columns={0:'records'})
But I don't know how to get the characteristic corresponding to the maximum.
Expected output (any of this is ok):
pd.Series(data=['A', 'A', 'B'], index = ['a', 'b', 'c'])
or
pd.Series(data=['A', 'B', 'B'], index = ['a', 'b', 'c'])
or
{'a':'A', 'b':'A', 'c':'B'}
or
{'a':'A', 'b':'B', 'c':'B'}
Upvotes: 1
Views: 126
Reputation: 863791
Use Series.value_counts
in lambda function per groups by Keywords
and return first value of index. Idea of solution is use value_counts
because sorting values by default by counts:
res = frame.groupby('Keywords')['Char'].apply(lambda x: x.value_counts().index[0])
print (res)
Keywords
a A
b B
c B
Name: Char, dtype: object
Upvotes: 3