maxmijn
maxmijn

Reputation: 347

Returning most occurring element in row of pandas dataframe

I have a pandas DataFrame with a column containing strings like this:

percentage | name
-----------------
122        | a
122        | b
122        | b
122        | c

Now I want to return the most frequent name, in this example 'b'. I know I can do this by iterating over the rows and keeping a counter, but there must be a more elegant way to do this.

Upvotes: 0

Views: 159

Answers (3)

jezrael
jezrael

Reputation: 862511

You can use value_counts and first_valid_index or idxmax what is same as argmax:

print df.name.value_counts().first_valid_index()
#b
print df.name.value_counts().idxmax()
#b

Timings:

These timings are going to be very dependent on the size of s as well as the number (and position) of values:

In [145]: %timeit df.name.value_counts().argmax()
The slowest run took 5.25 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 322 µs per loop

In [146]: %timeit df.name.value_counts().index[0]
The slowest run took 6.32 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 275 µs per loop

In [147]: %timeit df.name.value_counts().first_valid_index()
The slowest run took 5.43 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 292 µs per loop

In [148]: %timeit df.name.value_counts().idxmax()
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 321 µs per loop

Upvotes: 0

Anton Protopopov
Anton Protopopov

Reputation: 31662

You could use value_counts and argmax:

In [221]: df.name.value_counts().argmax()
Out[221]: 'b'

Upvotes: 3

EdChum
EdChum

Reputation: 393963

You can access the index of value_counts which sorts on count:

In [85]:
df['name'].value_counts().index[0]

Out[85]:
'b'

output from value_counts:

In [86]:
df['name'].value_counts()

Out[86]:
b    2
c    1
a    1
Name: name, dtype: int64

Upvotes: 3

Related Questions