Ace
Ace

Reputation: 35

How to quickly check if a value exists in pandas DataFrame index?

When I update the dataframe, I should check if the value exists in the dataframe index, but I want to know which way is faster, thanks!

1. if value in set(dataframe.index) 2. if value in dataframe.index

Upvotes: 2

Views: 3164

Answers (2)

Hello.World
Hello.World

Reputation: 740

Another way could be.

a=time.time()
if value in set(dataframe.index)
b=time.time()
timetaken=b-a

Upvotes: 0

jezrael
jezrael

Reputation: 863256

You need second solution:

value in dataframe.index

Sample:

df = pd.DataFrame({'A':range(100000)})
df.index = df.index.astype(np.int64)
print (df.index)


In [64]: %timeit (5000 in df.index)
The slowest run took 37.76 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 523 ns per loop

In [65]: %timeit (5000 in df.index.values)
The slowest run took 5.24 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 76.7 µs per loop

In [66]: %timeit (5000 in set(df.index))
100 loops, best of 3: 7.34 ms per loop

Timimgs for more data:

df = pd.DataFrame({'A':range(100000)})
df.index = df.index.astype(np.int64)

np.random.seed(2017)
a = np.random.randint(100000, size=1000)

In [73]: %timeit ([i in df.index for i in a])
The slowest run took 4.36 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 636 µs per loop

In [74]: %timeit ([i in df.index.values for i in a])
1 loop, best of 3: 208 ms per loop

In [75]: %timeit ([i in set(df.index) for i in a])
1 loop, best of 3: 7.44 s per loop

Upvotes: 3

Related Questions