Vineet Kumar Doshi
Vineet Kumar Doshi

Reputation: 4870

Why is string search fastest as compared to list search and set search?

I was solving some competitive coding coding challenges. My solution was exceeding time limit when I was searching in list, but was accepted when I searched in string.

Below is a sample code:

CODE

import timeit

list = timeit.Timer("'A' in list('AEIOU')")
set = timeit.Timer("'A' in set('AEIOU')")
string = timeit.Timer("'A' in 'AEIOU'")

print 'List search :',list.timeit(999999)
print 'Set search',set.timeit(999999)
print 'String search',string.timeit(999999)

OUTPUT

List search : 1.0873670578
Set search 1.0083398819
String search 0.0997061729431

*This output was obtained in ideone.com ... http://ideone.com/5FxkX2
**Output may vary from system to system but the trend is same.

I found that string search is incredibly faster. I googled but didn't found a satisfactory reasoning to this. Please help me understand why is string search fastest?

Upvotes: 3

Views: 293

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1122282

This has very little to do with actual search speed and everything with having to look up a global name (set or list) and calling that global (which includes a frame save and restore).

Test the actual in operator alone, not with the call:

>>> import timeit
>>> timeit.Timer("'A' in l", "l = list('AEIOU')").timeit(999999)
0.0490870475769043
>>> timeit.Timer("'A' in s", "s = set('AEIOU')").timeit(999999)
0.0507349967956543
>>> timeit.Timer("'A' in s", "s = 'AEIOU'").timeit(999999)
0.04982495307922363

Now the times are too close to call, because your objects are way too small. Use much larger inputs to actually see a difference.

Just using more characters already increases the contrast somewhat:

>>> timeit.Timer("'A' in l", "from string import ascii_letters, digits, whitespace; l = list(ascii_letters + digits + whitespace)").timeit(999999)
0.38520193099975586
>>> timeit.Timer("'A' in s", "from string import ascii_letters, digits, whitespace; s = set(ascii_letters + digits + whitespace)").timeit(999999)
0.046868085861206055
>>> timeit.Timer("'A' in s", "from string import ascii_letters, digits, whitespace; s = ascii_letters + digits + whitespace").timeit(999999)
0.06492304801940918

Now set() objects are starting to win, and list objects show their slower speed.

You should also test for the negative case, where the character is not present in the object:

>>> timeit.Timer("'\xff' in l", "from string import ascii_letters, digits, whitespace; l = list(ascii_letters + digits + whitespace)").timeit(999999)
0.909243106842041
>>> timeit.Timer("'\xff' in s", "from string import ascii_letters, digits, whitespace; s = set(ascii_letters + digits + whitespace)").timeit(999999)
0.05034899711608887
>>> timeit.Timer("'\xff' in s", "from string import ascii_letters, digits, whitespace; s = ascii_letters + digits + whitespace").timeit(999999)
0.09932804107666016

Here the set() object really excels.

Note that the string search still beats the list search in both cases; that's because iterating over a C array and comparing bytes is faster than comparing objects in a dynamic array, where you cannot up-front know that all the objects are the same type.

Upvotes: 4

Cory Kramer
Cory Kramer

Reputation: 117886

Your test is biased, you need to use the setup argument to neglect the time to create the list or set

>>> timeit.timeit("'A' in l", "l = list('AEIOU')")
0.06389708405846951
>>> timeit.timeit("'A' in s", "s = set('AEIOU')")
0.05960524183085081
>>> timeit.timeit("'A' in s", "s = 'AEIOU'")
0.05756433387793081

That being said, the in operation for str and list is linear O(N) and set is constant O(1) (neglecting lots of hash collisions). So for a much larger example, the set would be the fastest, and str and list would be very similar.

Upvotes: 6

Related Questions