Reputation: 35
When I update the dataframe
, I should check if the value exists in the dataframe
index, but I want to know which way is faster, thanks!
1. if value in set(dataframe.index)
2. if value in dataframe.index
Upvotes: 2
Views: 3164
Reputation: 740
Another way could be.
a=time.time()
if value in set(dataframe.index)
b=time.time()
timetaken=b-a
Upvotes: 0
Reputation: 863256
You need second solution:
value in dataframe.index
Sample:
df = pd.DataFrame({'A':range(100000)})
df.index = df.index.astype(np.int64)
print (df.index)
In [64]: %timeit (5000 in df.index)
The slowest run took 37.76 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 523 ns per loop
In [65]: %timeit (5000 in df.index.values)
The slowest run took 5.24 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 76.7 µs per loop
In [66]: %timeit (5000 in set(df.index))
100 loops, best of 3: 7.34 ms per loop
Timimgs for more data:
df = pd.DataFrame({'A':range(100000)})
df.index = df.index.astype(np.int64)
np.random.seed(2017)
a = np.random.randint(100000, size=1000)
In [73]: %timeit ([i in df.index for i in a])
The slowest run took 4.36 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 636 µs per loop
In [74]: %timeit ([i in df.index.values for i in a])
1 loop, best of 3: 208 ms per loop
In [75]: %timeit ([i in set(df.index) for i in a])
1 loop, best of 3: 7.44 s per loop
Upvotes: 3