FooBar
FooBar

Reputation: 16508

What's the point of XXX in df.column

Let's say I'd like to know if a number is in my pd.Dataframe column.

I'd do:

999 in test.ind
Out[29]: 
True

However, that's odd, given that

test.ind.max()
Out[28]: 
932

and indeed,

(999 == test.ind).sum()
Out[30]: 
0

The column is of type dtype('int64'). Now, clearly the x in series expression didn't work as I expected. However, is there some other point of this?

What does xx in pd.Series evaluate to?

Upvotes: 0

Views: 1023

Answers (1)

EdChum
EdChum

Reputation: 394159

it's evaluates whether 999 is in your Series index, the __contains__ operator which is what is called when you call in tests for the value in the index, not for whether the value is in the values, for that you can use isin or ==:

In [6]:
s = pd.Series(np.arange(5), index=list('abcde'))
s

Out[6]:
a    0
b    1
c    2
d    3
e    4
dtype: int32

In [7]:
'c' in s

Out[7]:
True

In [8]:
s.isin([2])

Out[8]:
a    False
b    False
c     True
d    False
e    False
dtype: bool

it's implemented like so:

def __contains__(self, item):
    return item in self.items

see: https://github.com/pandas-dev/pandas/blob/master/pandas/core/internals.py#L3358

and the docs

thanks @chrisb

Upvotes: 3

Related Questions