Bleeding Fingers
Bleeding Fingers

Reputation: 7129

Why do two different equality tests between two strings of same length take different amounts of time

According to the Python reference manual:

Strings are compared lexicographically using the numeric equivalents (the result of the built-in function ord()) of their characters. Unicode and 8-bit strings are fully interoperable in this behavior.

Meaning hashing is not used for this purpose.

Now, lets assume that internally during an equality test Python first checks the length of the two strings and proceeds with the lexicographical comparison, if both are of same length (I suppose it does the same with all the other comparisons too).

Well, if that is so then why does the following two, different, comparisons consume different amounts of time?

>>> str1 = "foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge1"
>>> 
>>> str2 = "foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge2"
>>> 
>>> def compare():
     t1 = time.time()
     for x in xrange(100000000):
         str1 == str2
     print time.time() - t1
 
>>> 
>>> compare()
13.001019001
>>> 
>>> str2 = "foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge1"
>>> 
>>> compare()
7.41645097733

In both the comparisons str1 and str2 are of the same length. In the first one both differ only by the last character and in the second by none.

PS: Used a long strings and large iterations to make the difference appreciable.

Upvotes: 2

Views: 114

Answers (4)

Gregor
Gregor

Reputation: 4434

In the second case, the two strings are in fact the same object, having the same id.

str1 = "foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge1"
str2 = "foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge2"

def compare():
    t1 = time.time()
    for x in xrange(100000000):
        str1 == str2
    print time.time() - t1


print "%d, %d" % (id(str1), id(str2))
compare()

str2 = "foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge_foo_bar_cruft_kludge1"

print "%d, %d" % (id(str1), id(str2))
compare()

In the second case, the ids are identical, therefore the python interpreter will not have to compare the whole string byte-by-byte.

Upvotes: 1

actual
actual

Reputation: 2370

There is some pool of string literals in Python, as in many languages, in the second case both strings are, in fact, the same object, so they are compared by references, not by actual values.

Typical string comparison function in reference-based languages:

if (ref(a) == ref(b)) return true;
if (len(a) != len(b)) return false;
return compare_actual_data(a, b);

Upvotes: 1

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799400

As the other answers surmise, the comparison is indeed short-circuited if the strings are identical, which is the case for string literals with the same content.

Upvotes: 1

James
James

Reputation: 25543

If two strings are 'interned' then they can be compared by a pointer comparison. Maybe in this case the byte-code generator is noticing the strings are constants and interning them?

You could demonstrate (or disprove) this by showing that the smaller timing is largely independent (or not) of the length of the string.

And indeed, I get these results:

$ ./test.py 
17.4877259731
8.61179995537
$ ./test_halflengthstrings.py 
12.6504788399
8.58732700348

Upvotes: 0

Related Questions