Shashank Sharma
Shashank Sharma

Reputation: 405

Python Comparing 2 list and 2 tuple with equality sign

Recently I have been learning more about hashing in Python and I came around this blog where it states that:

Suppose a Python program has 2 lists. If we need to think about comparing those two lists, what will you do? Compare each element? Well, that sounds fool-proof but slow as well!

Python has a much smarter way to do this. When a tuple is constructed in a program, Python Interpreter calculates its hash in its memory. If a comparison occurs between 2 tuples, it simply compares the hash values and it knows if they are equal!

So I am really confused about these statements.

First when we do: [1, 2, 3] == [1, 2, 3] then how does this equality works ? Does it calculate the hash value and then compare it ?

Second what's the difference when we do:

[1, 2, 3] == [1, 2, 3] and (1, 2, 3) == (1, 2, 3) ?

Because when I tried to find time of execution with timeit then I got this result:

$ python3.5 -m timeit '[1, 2, 3] == [1, 2, 3]'
10000000 loops, best of 3: 0.14 usec per loop
$ python3.5 -m timeit '(1, 2, 3) == (1, 2, 3)'
10000000 loops, best of 3: 0.0301 usec per loop

So why there is difference in time from 0.14 for list to 0.03 for tuple which is faster than list.

Upvotes: 2

Views: 1670

Answers (2)

abarnert
abarnert

Reputation: 365767

Well, part of your confusion is that the blog post you're reading is just wrong. About multiple things. Try to forget that you ever read it (except to remember the site and the author's name so you'll know to avoid them in the future).

It is true that tuples are hashable and lists are not, but that isn't relevant to their equality-testing functions. And it's certainly not true that "it simply compares the hash values and it knows if they are equal!" Hash collisions happen, and ignoring them would lead to horrible bugs, and fortunately Python's developers are not that stupid. In fact, it's not even true that Python computes the hash value at initialization time.*

There actually is one significant difference between tuples and lists (in CPython, as of 3.6), but it usually doesn't make much difference: Lists do an extra check for unequal length at the beginning as an optimization, but the same check turned out to be a pessimization for tuples,** so it was removed from there.

Another, often much more important, difference is that tuple literals in your source get compiled into constant values, and separate copies of the same tuple literal get folded into the same constant object; that doesn't happen with lists, for obvious reasons.

In fact, that's what you're really testing with your timeit. On my laptop, comparing the tuples takes 95ns, while comparing the lists takes 169ns—but breaking it down, that's actually 93ns for the comparison, plus an extra 38ns to create each list. To make it a fair comparison, you have to move the creation to a setup step, and then compare already-existing values inside the loop. (Or, of course, you may not want to be fair—you're discovering the useful fact that every time you use a tuple constant instead of creating a new list, you're saving a significant fraction of a microsecond.)

Other than that, they basically do the same thing. Translating the C source into Python-like pseudocode (and removing all the error handling, and the stuff that makes the same function work for <, and so on):

for i in range(min(len(v), len(w))):
    if v[i] != w[i]:
        break
else:
    return len(v) == len(w)
return False

The list equivalent is like this:

if len(v) != len(w):
    return False
for i in range(min(len(v), len(w))):
    if v[i] != w[i]:
        break
else:
    return True
return False

* In fact, unlike strings, tuples don't even cache their hashes; if you call hash over and over, it'll keep re-computing it. See issue 9685, where a patch to change that was rejected because it slowed down some benchmarks and didn't speed up anything anyone could find.

** Not because of anything inherent about the implementation, but because people often compare lists of different lengths, but rarely do with tuples.

Upvotes: 8

Roushan
Roushan

Reputation: 4420

The answer was given in that article too :)

here is the demonstrated :

>>> l1=[1,2,3]
>>> l2=[1,2,3]
>>> 
>>> hash(l1) #since list is not hashable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> t=(1,2,3)
>>> t2=(1,2,3)
>>> hash(t)
2528502973977326415
>>> hash(t2)
2528502973977326415
>>> 

in above when you call hash on list it will give you TypeError as it is not hashable and for equality of two list python check its inside value that will lead to take much time

in case of tuple it calculates the hash value and for two similar tuplle having the same hash value so python only compared the hashvalue of tuple so it us much fast than list

from given article

Python has a much smarter way to do this. When a tuple is constructed in a program, Python Interpreter calculates its hash in its memory. If a comparison occurs between 2 tuples, it simply compares the hash values and it knows if they are equal!

Upvotes: 0

Related Questions