lch
lch

Reputation: 4951

Different ways to convert a byte into integer in python

from binascii import unhexlify
import time
import struct

var = 'FF'
bytes = unhexlify(var)

start_time = time.time()
for i in range(10000):
  temp = struct.unpack('B', bytes)[0]
print("- %s milli sec -" % ((time.time() - start_time)*1000))


start_time = time.time()
for i in range(10000):
  temp = int.from_bytes(bytes, byteorder='big')
print("- %s milli sec -" % ((time.time() - start_time)*1000))


start_time = time.time()
for i in range(10000):
  temp = bytes[0]
print("- %s milli sec -" % ((time.time() - start_time)*1000))

output

- 5.327939987182617 milli sec -
- 12.086629867553711 milli sec -
- 1.882314682006836 milli sec -

Obviously, the 3rd one is a lot faster than others.

Is there any technical reasoning for this. Also, can someone tell me the pros and cons of these approaches? If there is any other better way to achieve this, please explain it as an answer.

program available here: https://repl.it/repls/AcceptableCompatiblePrograms

Upvotes: 0

Views: 143

Answers (1)

abarnert
abarnert

Reputation: 366133

The first problem is that you're not benchmarking properly (you should be using timeit, because it takes care of all the things you didn't think of), and, even more importantly, you're benchmarking the wrong thing.

Using %timeit in IPython on my laptop, here are the times for the three parts of your process:

  • b = unhexlify(var): 209ns
  • b[0]: 16.0ns
  • temp = b0: 50.2ns

So, you're focusing on the fastest part of the process.

Meanwhile, once you've decided to use unhexlify to get a bytes object, of course the fastest way to get the first byte out of that is b[0]. How could anything possibly be any faster than that?

But if you take a step back:

  • int(var, 16): 233ns

This is nearly as fast as unhexlify(var)[0]—within 4%, and a difference in the single-digit nanos. Which may not even be consistently repeatable across systems, Python versions, or input values. But, even if it were, it's hard to imagine an application where this 8ns makes a difference where you couldn't get a much bigger speedup by stepping back and doing something at a higher level. Sure, it's not impossible this could come up, by immediately jumping to how to micro-optimize this operation is almost always going to be a mistake.

Even more importantly, unhexlify(var)[0] only works for single-byte values. Try it with, say, FF22 and you're going to get 255 instead of 65314. The other options—including int—will give you the right answer.

And of course using struct and int.from_bytes give you more flexibility—you can read bytes in either endianness, or specify the exact size you expect (or read exactly that many bytes out of an input buffer without consuming the whole buffer).

So, the right answer is to use the one that does what you actually want in the most obvious way. Something that's faster but wrong is not helpful, and even something that's faster but not obviously right often isn't helpful.

And this means that if what you're actually trying to do is (contrary to what you implied in your question) iterate or index a bytes as integers from 0 to 255, the right thing to do is for by in b: or b[0]. Not because it's fastest (although it is), but because it directly does exactly what you want to do—it's the One Obvious Way To Do It.

Upvotes: 1

Related Questions