Koray Tugay
Koray Tugay

Reputation: 23800

How to find values that occur specific number of times in Pandas Series?

Given the following Series:

sr = pd.Series([5, 5, 5, 6, 6, 6, 7, 7, 8, 8, 8])

I want to find the values that occur 3 times. This is my solution which seems to work but looks very strange:

(sr.value_counts() == 3)[sr.value_counts() == 3].index.values

Is there any other/obvious way I am missing?

Upvotes: 3

Views: 242

Answers (4)

Abhi
Abhi

Reputation: 4233

You could also use .where:

sr.where(sr.value_counts()==3).dropna().index

# Output:
Int64Index([5, 6, 8], dtype='int64')

Upvotes: 1

jpp
jpp

Reputation: 164783

Your logic is fine, you just shouldn't repeat the most expensive part, which is the counting. Store this in a variable and reuse. You may also not need to retrieve the underlying NumPy array, pd.Index objects are often sufficient:

sr = pd.Series([5, 5, 5, 6, 6, 6, 7, 7, 8, 8, 8])

counts = sr.value_counts()

res = counts[counts == 3].index
# Int64Index([8, 6, 5], dtype='int64')

The reason there's no ready-made method for what you want is any solution will require minimum O(n) time complexity, which is the complexity for value_counts. There's no way round this.

One alternative, dict-based collections.Counter, will be less efficient when it comes to filtering by count. Since NumPy arrays are stored efficiently in memory, Boolean filtering is efficient relative to dictionary iteration.

Upvotes: 3

BENY
BENY

Reputation: 323356

Using loc

sr.value_counts().loc[lambda x : x==3].index
Out[162]: Int64Index([8, 6, 5], dtype='int64')

Upvotes: 2

sacuL
sacuL

Reputation: 51395

@jpp's answer is probably the one you should go with, but here is a weird alternative (just for fun):

sr.groupby(sr).filter(lambda x: len(x) == 3).unique()
#array([5, 6, 8])

Upvotes: 2

Related Questions