jizhihaoSAMA
jizhihaoSAMA

Reputation: 12672

Speed test among set, list and tuple in Python gives surprising results

I tried to test the speed among set,list and tuple and got surprising results.

Before this, I have known that set is faster than list based on this answer.

Here is my test code:

import timeit,time
from sys import getsizeof as Size

List_Test = [range(1000)]
print("The Size of List is : {}".format(Size(List_Test)))
Set_Test = set(range(1000))
print("The Size of Set is : {}".format(Size(Set_Test)))
Tuple_Test = tuple(range(1000))
print("The Size of Tuple is : {}".format(Size(Tuple_Test)))


print("\nNow is to test speed\n")
time.sleep(3)

def Create_List():
    List = [i for i in range(1000)]

def Test_List():
    for i in List_Test:
        if i == 6:
            break

def Create_Set():
    Set = set(i for i in range(1000))

def Test_Set():
    for i in Set_Test:
        if i == 6:
            break

def Create_Tuple():
    Tuple = tuple(i for i in range(1000))

def Test_Tuple():
    for i in Tuple_Test:
        if i == 6:
            break

t = timeit.repeat(stmt="Create_List()",number=1000,setup="from __main__ import Create_List", repeat=30)
print("The Time of Create_List : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Create_Tuple()",number=1000,setup="from __main__ import Create_Tuple", repeat=30)
print("The Time of Create_Tuple : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Create_Set()",number=1000,setup="from __main__ import Create_Set", repeat=30)
print("The Time of Create_Set : {}".format(sum(t)/len(t)))

print("\n")

t = timeit.repeat(stmt="Test_List()",number=1000,setup="from __main__ import Test_List", repeat=30)
print("The Time of Test_List : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Test_Tuple()",number=1000,setup="from __main__ import Test_Tuple", repeat=30)
print("The Time of Test_Tuple : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Test_Set()",number=1000,setup="from __main__ import Test_Set", repeat=30)
print("The Time of Test_Set : {}".format(sum(t)/len(t)))

print("\nThe end")

Finally, I found that:

Size: Set > Tuple > List
Speed: List > Tuple > Set

I think my test code is wrong. What's wrong with it?


Edit:

I changed the test code:

List_Test = list(range(500000))
......

def Test_List():
    randomNumber = random.randint(0,500000)
    for i in List_Test:
        if i == randomNumber:
            break

# Other test code is the same

And the result of the test always is list ≈ tuple > set.

When change the number 500000(x20) to 10000000,

It sometimes is list ≈ tuple ≈ set,but often list ≈ tuple > set.

May I infer that only when all of them have the same length(And the number of length is large),we can use set (Although its size is much larger than tuple and list)?

Upvotes: 0

Views: 2500

Answers (3)

ggorlen
ggorlen

Reputation: 56965

There are many problems with this test suite.

[range(1000)] isn't equivalent to the other two declarations. This makes a list with one element, and that single element points to the range. getsizeof is not recursive, so it only gives the size of the outer object. This can be illustrated by creating lists of a single element of different range sizes and noticing that the getsizeof call always gives the same result.

If you use list(range(1000)), you'll get a reasonable result that's about the same size as a tuple. I'm not sure what information is gained in making these lists 1000, though--why not compare sizes of the three empty elements? Even here, this is basically an implementation-dependent test that doesn't really have much bearing on how you might write a Python program, as far as I can tell. Sure, set is a bit larger as you'd expect (hash-based structures generally have more overhead than lists) and that varies from version to version.

As for the "create" test, consider set(i for i in range(1000)). In no way does this test the time it takes to create a set because most of the time is spent creating and running the generator that's being passed as a parameter. As with the last test, I'm not sure what this proves even if the test were fair (i.e. you used the list() initializer instead of a list comprehension). As with the "size" tests, you can just call the initializers with empty lists, but all this shows is that creation times are practically the same: there's some function call and object allocation overhead which is implementation-specific.

Lastly, the speed tests for lookup operations:

for i in Set_Test:
    if i == 6:
        break

This hardcodes a best-case scenario. List and tuple lookups perform a linear search, so the target is always found in 7 comparisons. This is going to be nearly identical to a set lookup, which is O(1) but requires some complicated hashing operations that add overhead. This test should use random indices where the distribution means the lists and tuples will be hit with worst-case scenarios regularly. The set should outperform lists and tuples done correctly.

Furthermore, a statement like "a set is faster than a list" has almost no meaning--data structures can't be compared like this, and a word like "faster" speaks to specific run-time conditions that are highly variable and case-specific. Comparing the data structures requires comparison of their operations' time complexities which help describe their fitness for some purpose.

To summarize: the first two tests aren't worth performing even if written correctly and the last test will show that set lookups are O(1) while list/tuple lookups are O(n) if they're made fair using random target items.

Upvotes: 5

dstromberg
dstromberg

Reputation: 7177

Maybe try:

#!/usr/local/cpython-3.8/bin/python3

import timeit,time
from sys import getsizeof as Size

List_Test = list(range(1000))
print("The Size of List is  : {}".format(Size(List_Test)))
Set_Test = set(range(1000))
print("The Size of Set is   : {}".format(Size(Set_Test)))
Tuple_Test = tuple(range(1000))
print("The Size of Tuple is : {}".format(Size(Tuple_Test)))


print("\nNow is to test speed\n")
time.sleep(3)

def Create_List():
    List = [i for i in range(1000)]

def Test_List():
    if 500 in List_Test:
        pass

def Create_Set():
    Set = set(i for i in range(1000))

def Test_Set():
    if 500 in Set_Test:
        pass

def Create_Tuple():
    Tuple = tuple(i for i in range(1000))

def Test_Tuple():
    if 500 in Tuple_Test:
        pass

t = timeit.repeat(stmt="Create_List()",number=1000,setup="from __main__ import Create_List", repeat=30)
print("The Time of Create_List  : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Create_Tuple()",number=1000,setup="from __main__ import Create_Tuple", repeat=30)
print("The Time of Create_Tuple : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Create_Set()",number=1000,setup="from __main__ import Create_Set", repeat=30)
print("The Time of Create_Set   : {}".format(sum(t)/len(t)))

print("\n")

t = timeit.repeat(stmt="Test_List()",number=1000,setup="from __main__ import Test_List", repeat=30)
print("The Time of Test_List  : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Test_Tuple()",number=1000,setup="from __main__ import Test_Tuple", repeat=30)
print("The Time of Test_Tuple : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Test_Set()",number=1000,setup="from __main__ import Test_Set", repeat=30)
print("The Time of Test_Set   : {}".format(sum(t)/len(t)))

print("\nThe end")

Upvotes: 0

blhsing
blhsing

Reputation: 106552

List_Test = [range(1000)] creates a one-item list that contains a range object without actually making the range generator produce any output.

You should use the list constructor instead to actually create a list with the generator output of the range object unpacked:

List_Test = list(range(1000))

Upvotes: 2

Related Questions