Reputation: 4640

pandas: How to get the most frequent item in pandas series?

How can I get the most frequent item in a pandas series?

Consider the series s

s = pd.Series("1 5 3 3 3 5 2 1 8 10 2 3 3 3".split()).astype(int)

The returned value should be 3

Upvotes: 7

Answers (4)

jpp

Reputation: 164623

You can just use pd.Series.mode and extract the first value:

res = s.mode().iloc[0]

This not necessarily inefficient. As always, test with your data to see what suits.

import numpy as np, pandas as pd
from scipy.stats.mstats import mode
from collections import Counter

np.random.seed(0)

s = pd.Series(np.random.randint(0, 100, 100000))

def jez_np(s):
    _, idx, counts = np.unique(s, return_index=True, return_counts=True)
    index = idx[np.argmax(counts)]
    val = s[index]
    return val

def pir(s):
    i, r = s.factorize()
    return r[np.bincount(i).argmax()]

%timeit s.mode().iloc[0]                 # 1.82 ms
%timeit pir(s)                           # 2.21 ms
%timeit s.value_counts().index[0]        # 2.52 ms
%timeit mode(s).mode[0]                  # 5.64 ms
%timeit jez_np(s)                        # 8.26 ms
%timeit Counter(s).most_common(1)[0][0]  # 8.27 ms

Upvotes: 9

piRSquared

Reputation: 294218

`pandas.factorize` and `numpy.bincount`

This is very similar to @jezrael's Numpy answer. The difference is the use of factorize and not numpy.unique

factorize returns an integer factorization and unique values
bincount counts how many of each unique value
argmax identifies which bin or factor is the most fequent
Use the position of the bin returned from argmax to reference the most frequent value from the array of unique values

i, r = s.factorize()
r[np.bincount(i).argmax()]

3

Upvotes: 3

jezrael

Reputation: 862511

Use value_counts and select first value by index:

val = s.value_counts().index[0]

Or Counter.most_common:

from collections import Counter

val = Counter(s).most_common(1)[0][0]

Or numpy solution:

_, idx, counts = np.unique(s, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
val = s[index]

Upvotes: 7

ramakrishnareddy

Reputation: 631

from scipy import stats
import pandas as pd
x=[1,5,3,3,3,5,2,1,8,10,2,3,3,3]
data=pd.DataFrame({"values":x})


print(stats.mode(data["values"]))

output:-ModeResult(mode=array([3], dtype=int64), count=array([6]))

Upvotes: 1

pandas: How to get the most frequent item in pandas series?

Answers (4)

pandas.factorize and numpy.bincount

Related Questions

`pandas.factorize` and `numpy.bincount`