Reputation: 1694
Hi I have a dataframe that I'd like to select the column with the highest percentage from a frequency table.
d = {'c1':['a', 'a', 'b', 'b', 'c', 'c'], 'c2':['Low', 'High', 'Low', 'High', 'High', 'High']}
dd = pd.DataFrame(data=d)
dd.groupby('c1')['c2'].value_counts(normalize=True).mul(100)
It will return a frequency table
c1 c2
a High 50.0
Low 50.0
b High 50.0
Low 50.0
c High 100.0
Name: c2, dtype: float64
I'd like to print out c
which has the highest percentage 100.0
I'm able to use max()
to print out 100.0
but don't know how to print out c
Upvotes: 2
Views: 142
Reputation: 323316
Maybe just do
dd.groupby('c1')['c2'].value_counts(normalize=True).idxmax()[0]
Out[102]: 'c'
Upvotes: 1
Reputation: 26676
Lets try reset_index and drop level=1 and then find the maximum index using idxmax
dd.groupby('c1')['c2'].value_counts(normalize=True).mul(100).reset_index(level=1, drop=True).idxmax()
Upvotes: 5