Reputation: 347
I have a pandas DataFrame with a column containing strings like this:
percentage | name
-----------------
122 | a
122 | b
122 | b
122 | c
Now I want to return the most frequent name, in this example 'b'. I know I can do this by iterating over the rows and keeping a counter, but there must be a more elegant way to do this.
Upvotes: 0
Views: 159
Reputation: 862511
You can use value_counts
and first_valid_index
or idxmax
what is same as argmax
:
print df.name.value_counts().first_valid_index()
#b
print df.name.value_counts().idxmax()
#b
Timings:
These timings are going to be very dependent on the size of s as well as the number (and position) of values:
In [145]: %timeit df.name.value_counts().argmax()
The slowest run took 5.25 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 322 µs per loop
In [146]: %timeit df.name.value_counts().index[0]
The slowest run took 6.32 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 275 µs per loop
In [147]: %timeit df.name.value_counts().first_valid_index()
The slowest run took 5.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 292 µs per loop
In [148]: %timeit df.name.value_counts().idxmax()
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 321 µs per loop
Upvotes: 0
Reputation: 31662
You could use value_counts
and argmax
:
In [221]: df.name.value_counts().argmax()
Out[221]: 'b'
Upvotes: 3
Reputation: 393963
You can access the index
of value_counts
which sorts on count:
In [85]:
df['name'].value_counts().index[0]
Out[85]:
'b'
output from value_counts
:
In [86]:
df['name'].value_counts()
Out[86]:
b 2
c 1
a 1
Name: name, dtype: int64
Upvotes: 3