Reputation: 4640
How can I get the most frequent item in a pandas
series?
Consider the series s
s = pd.Series("1 5 3 3 3 5 2 1 8 10 2 3 3 3".split()).astype(int)
The returned value should be 3
Upvotes: 7
Views: 7029
Reputation: 164623
You can just use pd.Series.mode
and extract the first value:
res = s.mode().iloc[0]
This not necessarily inefficient. As always, test with your data to see what suits.
import numpy as np, pandas as pd
from scipy.stats.mstats import mode
from collections import Counter
np.random.seed(0)
s = pd.Series(np.random.randint(0, 100, 100000))
def jez_np(s):
_, idx, counts = np.unique(s, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
val = s[index]
return val
def pir(s):
i, r = s.factorize()
return r[np.bincount(i).argmax()]
%timeit s.mode().iloc[0] # 1.82 ms
%timeit pir(s) # 2.21 ms
%timeit s.value_counts().index[0] # 2.52 ms
%timeit mode(s).mode[0] # 5.64 ms
%timeit jez_np(s) # 8.26 ms
%timeit Counter(s).most_common(1)[0][0] # 8.27 ms
Upvotes: 9
Reputation: 294218
pandas.factorize
and numpy.bincount
This is very similar to @jezrael's Numpy answer. The difference is the use of factorize
and not numpy.unique
factorize
returns an integer factorization and unique valuesbincount
counts how many of each unique valueargmax
identifies which bin or factor is the most fequentargmax
to reference the most frequent value from the array of unique valuesi, r = s.factorize()
r[np.bincount(i).argmax()]
3
Upvotes: 3
Reputation: 862511
Use value_counts
and select first value by index
:
val = s.value_counts().index[0]
from collections import Counter
val = Counter(s).most_common(1)[0][0]
Or numpy solution:
_, idx, counts = np.unique(s, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
val = s[index]
Upvotes: 7
Reputation: 631
from scipy import stats
import pandas as pd
x=[1,5,3,3,3,5,2,1,8,10,2,3,3,3]
data=pd.DataFrame({"values":x})
print(stats.mode(data["values"]))
output:-ModeResult(mode=array([3], dtype=int64), count=array([6]))
Upvotes: 1