Cool Javelin
Cool Javelin

Reputation: 788

Performance: (Compare string) vs (convert to int)

Hi all: I am new to Stack Overflow and am rather new to python, but I have been writing code for years and would like to know which of the following would be better performance.

Assume I have loaded envioron from os, and the flag in the environment is guaranteed to be either a "0" or "1".

if environ["Flag"] == "1":
    do_something

or

if int(environ["Flag"]) == 1:
    do something

At first glance, it looks like the conversion to int, then comparison would be slower because of the conversion, however, I know string comparisons can be slow also.

Has anyone ever examined this?

Thanks, Mark.

Upvotes: 7

Views: 7694

Answers (4)

Mahesha999
Mahesha999

Reputation: 24731

Below is some quick dirty comparison.

import time

s1 = '10000000000000000000001'
s2 = '10000000000000000000002'

Approach 1: Does not type casts strings to ints

  • Makes sense when numeric strings are of equal length as '1000' < '2' since lexicographically '1' < '2'.

  • Preferrable when you receive numbers at different times in execution flow.

  • One such example scenario

     t1 = time.time()
     for i in range(10000000):
        if s1<s2:
            pass
     print(time.time() - t1)
    

Example output:

0.5940780639648438

Approach 2: Type casts string every time its compared with Preferrable when you are dealing with

  • "numeric inequality involving numeric strings of different lengths" (note that for equality comparison of numeric strings of different length, there is obviously no need of type casting) and

  • you receive numbers at different times, that is not all at once at the beginning in which case approach 3 is obviously more suitable.

    t1 = time.time()
    
    for i in range(10000000):
        if int(s1)<int(s2):
            pass
    print(time.time() - t1)
    

Example output:

4.108525276184082

Approach 3: Type casts string every time its compared with.

n1 = int(s1)
n2 = int(s2)
t1 = time.time()
for i in range(10000000):
    if n1<n2:
        pass
print(time.time() - t1)

Example output:

0.5334858894348145

Upvotes: 0

user1342784
user1342784

Reputation:

The others are right in that when in doubt, time it.

But here's a bit of explanation:

When you compare two strings, the algorithm looks something like this:

from 0 to the length of the shortest string
     if characters at this position are different
          return false
return true

So the speed of a string comparison is entirely based on how much of the strings are equal. In your example, you are comparing to "1", a one character string. So in your case it boils down to:

if environ["Flag"][0] == "1"[0]

In other words, it is comparing a single byte to another single byte. Obviously a single comparison is going to be fast.

In your second case, you convert the string to an int. This takes a bit of time. But if we assume best case, and that the flag is always "0" or "1", it's probably something like:

i = s[0] - ord("0")

Then you compare two integers. Integers are four bytes, not one, but that probably doesn't matter on modern chips.

But in any case, this means that when you compare two strings, you are doing a single comparison. When you convert to int, you are doing the work of the conversion, then doing a single comparison. Hence, the string comparison is faster.

But again, this is situational. It is faster because you are comparing two strings of length 1. Comparing two ints is of constant speed but comparing two strings is proportional to the length of the shorter string.

Finally, taking a flag out of an environment variable is something you do only once per run. We're talking about a couple hundred nanoseconds in something you do once. Differences of that scale are only worth worrying about in loops that run many, many times. In this case, don't bother with performance and worry about what reads better. (Which is probably still the string comparison version.)

Upvotes: 2

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

In [44]: timeit int("1") == 1
1000000 loops, best of 3: 380 ns per loop

In [44]: timeit "1" == "1"
10000000 loops, best of 3: 36.5 ns per loop

Casting to int will always be slower which makes perfect sense, you start out with a string then convert to an int instead of just creating a string.

Converting is the most costly part:

In [45]: timeit 1
100000000 loops, best of 3: 11.9 ns per loop

In [46]: timeit "1" 
100000000 loops, best of 3: 11 ns per loop

In [47]: timeit int("1")
1000000 loops, best of 3: 366 ns per loop

There is a difference between creating a string using a = "1" than doing a = 1 b = str(1) which is where you may have gotten confused`.

In [3]: a = 1

In [4]: timeit str(b)
10000000 loops, best of 3: 135 ns per loop

timed using python2.7, the difference using python 3 is pretty much the same.

The output is from my ipython terminal using the ipython magic timeit function

Upvotes: 6

Marcin
Marcin

Reputation: 238279

Why not check it yourself:

import timeit

print(timeit.timeit('a="1"; a == "1"', number=10000))
print(timeit.timeit('a="1"; int(a) == 1', number=10000))

The result for me is:

0.0003461789892753586
0.0019836849969578907

Which would indicate that the string comparison is much faster.

Upvotes: 6

Related Questions