Reputation: 12672
I tried to test the speed among set
,list
and tuple
and got surprising results.
Before this, I have known that set
is faster than list
based on this answer.
Here is my test code:
import timeit,time
from sys import getsizeof as Size
List_Test = [range(1000)]
print("The Size of List is : {}".format(Size(List_Test)))
Set_Test = set(range(1000))
print("The Size of Set is : {}".format(Size(Set_Test)))
Tuple_Test = tuple(range(1000))
print("The Size of Tuple is : {}".format(Size(Tuple_Test)))
print("\nNow is to test speed\n")
time.sleep(3)
def Create_List():
List = [i for i in range(1000)]
def Test_List():
for i in List_Test:
if i == 6:
break
def Create_Set():
Set = set(i for i in range(1000))
def Test_Set():
for i in Set_Test:
if i == 6:
break
def Create_Tuple():
Tuple = tuple(i for i in range(1000))
def Test_Tuple():
for i in Tuple_Test:
if i == 6:
break
t = timeit.repeat(stmt="Create_List()",number=1000,setup="from __main__ import Create_List", repeat=30)
print("The Time of Create_List : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Create_Tuple()",number=1000,setup="from __main__ import Create_Tuple", repeat=30)
print("The Time of Create_Tuple : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Create_Set()",number=1000,setup="from __main__ import Create_Set", repeat=30)
print("The Time of Create_Set : {}".format(sum(t)/len(t)))
print("\n")
t = timeit.repeat(stmt="Test_List()",number=1000,setup="from __main__ import Test_List", repeat=30)
print("The Time of Test_List : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Test_Tuple()",number=1000,setup="from __main__ import Test_Tuple", repeat=30)
print("The Time of Test_Tuple : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Test_Set()",number=1000,setup="from __main__ import Test_Set", repeat=30)
print("The Time of Test_Set : {}".format(sum(t)/len(t)))
print("\nThe end")
Finally, I found that:
Size: Set > Tuple > List
Speed: List > Tuple > Set
I think my test code is wrong. What's wrong with it?
Edit:
I changed the test code:
List_Test = list(range(500000))
......
def Test_List():
randomNumber = random.randint(0,500000)
for i in List_Test:
if i == randomNumber:
break
# Other test code is the same
And the result of the test always is list ≈ tuple > set
.
When change the number 500000
(x20) to 10000000
,
It sometimes is list ≈ tuple ≈ set
,but often list ≈ tuple > set
.
May I infer that only when all of them have the same length(And the number of length is large),we can use set
(Although its size is much larger than tuple
and list
)?
Upvotes: 0
Views: 2500
Reputation: 56965
There are many problems with this test suite.
[range(1000)]
isn't equivalent to the other two declarations. This makes a list with one element, and that single element points to the range. getsizeof
is not recursive, so it only gives the size of the outer object. This can be illustrated by creating lists of a single element of different range sizes and noticing that the getsizeof
call always gives the same result.
If you use list(range(1000))
, you'll get a reasonable result that's about the same size as a tuple. I'm not sure what information is gained in making these lists 1000, though--why not compare sizes of the three empty elements? Even here, this is basically an implementation-dependent test that doesn't really have much bearing on how you might write a Python program, as far as I can tell. Sure, set
is a bit larger as you'd expect (hash-based structures generally have more overhead than lists) and that varies from version to version.
As for the "create" test, consider set(i for i in range(1000))
. In no way does this test the time it takes to create a set
because most of the time is spent creating and running the generator that's being passed as a parameter. As with the last test, I'm not sure what this proves even if the test were fair (i.e. you used the list()
initializer instead of a list comprehension). As with the "size" tests, you can just call the initializers with empty lists, but all this shows is that creation times are practically the same: there's some function call and object allocation overhead which is implementation-specific.
Lastly, the speed tests for lookup operations:
for i in Set_Test:
if i == 6:
break
This hardcodes a best-case scenario. List and tuple lookups perform a linear search, so the target is always found in 7 comparisons. This is going to be nearly identical to a set lookup, which is O(1) but requires some complicated hashing operations that add overhead. This test should use random indices where the distribution means the lists and tuples will be hit with worst-case scenarios regularly. The set should outperform lists and tuples done correctly.
Furthermore, a statement like "a set is faster than a list" has almost no meaning--data structures can't be compared like this, and a word like "faster" speaks to specific run-time conditions that are highly variable and case-specific. Comparing the data structures requires comparison of their operations' time complexities which help describe their fitness for some purpose.
To summarize: the first two tests aren't worth performing even if written correctly and the last test will show that set lookups are O(1) while list/tuple lookups are O(n) if they're made fair using random target items.
Upvotes: 5
Reputation: 7177
Maybe try:
#!/usr/local/cpython-3.8/bin/python3
import timeit,time
from sys import getsizeof as Size
List_Test = list(range(1000))
print("The Size of List is : {}".format(Size(List_Test)))
Set_Test = set(range(1000))
print("The Size of Set is : {}".format(Size(Set_Test)))
Tuple_Test = tuple(range(1000))
print("The Size of Tuple is : {}".format(Size(Tuple_Test)))
print("\nNow is to test speed\n")
time.sleep(3)
def Create_List():
List = [i for i in range(1000)]
def Test_List():
if 500 in List_Test:
pass
def Create_Set():
Set = set(i for i in range(1000))
def Test_Set():
if 500 in Set_Test:
pass
def Create_Tuple():
Tuple = tuple(i for i in range(1000))
def Test_Tuple():
if 500 in Tuple_Test:
pass
t = timeit.repeat(stmt="Create_List()",number=1000,setup="from __main__ import Create_List", repeat=30)
print("The Time of Create_List : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Create_Tuple()",number=1000,setup="from __main__ import Create_Tuple", repeat=30)
print("The Time of Create_Tuple : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Create_Set()",number=1000,setup="from __main__ import Create_Set", repeat=30)
print("The Time of Create_Set : {}".format(sum(t)/len(t)))
print("\n")
t = timeit.repeat(stmt="Test_List()",number=1000,setup="from __main__ import Test_List", repeat=30)
print("The Time of Test_List : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Test_Tuple()",number=1000,setup="from __main__ import Test_Tuple", repeat=30)
print("The Time of Test_Tuple : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Test_Set()",number=1000,setup="from __main__ import Test_Set", repeat=30)
print("The Time of Test_Set : {}".format(sum(t)/len(t)))
print("\nThe end")
Upvotes: 0
Reputation: 106552
List_Test = [range(1000)]
creates a one-item list that contains a range
object without actually making the range
generator produce any output.
You should use the list
constructor instead to actually create a list with the generator output of the range
object unpacked:
List_Test = list(range(1000))
Upvotes: 2