Reputation: 1532

Convert a 64 bit integer into 8 separate 1 byte integers in python

In python, I have been given a 64 bit integer. This Integer was created by taking several different 8 bit integers and mashing them together into one giant 64 bit integer. It is my job to separate them again.

For example:

Source number: 2592701575664680400
Binary (64 bits): 0010001111111011001000000101100010101010000101101011111000000000
int 1: 00100011 (35)
int 2: 11111011 (251)
int 3: 00100000 (32)
int 4: 01011000 (88)
int 5: 10101010 (170)
int 6: 00010110 (22)
int 7: 10111110 (190)
int 8: 00000000 (0)

So what I would like to do is take my source number 2592701575664680373 and return an array of length 8, where each int in the array are the ints listed above.

I was going to use struct, but to be perfectly honest, reading the documentation hasn't made it quite clear exactly how I would accomplish that.

Upvotes: 14

Answers (5)

Marcus

Reputation: 1

This seems faster for a bunch of unit64. Uses numpy.

from cytpes import *
import numpy as np
l1 = c_uint64 * 512
payload64 = l1(0)
payload8 = np.frombuffer(payload64, dtype=np.uint8)

Where payload8 is an array of np.unit8 afterwards 8 times the size of payload64 and has the converterd bytes in it.

For me it is faster than the struct variant...

for i in range(len(payload64)):
       payload8[i*8:i*8+8] = struct.unpack('8B', struct.pack('Q', payload64[i]))

Upvotes: 0

jojonas

Reputation: 1675

Solution

Solution without converting the number to a string:

x = 0b0010001111111011001000000101100010101010000101101011111000000000

numbers = list((x >> i) & 0xFF for i in range(0,64,8))
print(numbers)                    # [0, 190, 22, 170, 88, 32, 251, 35]
print(list(reversed(numbers)))    # [35, 251, 32, 88, 170, 22, 190, 0]

Explanation

Here I used list comprehensions, making a loop in increments of 8 over i. So i takes the values 0, 8, 16, 24, 32, 40, 48, 56. Every time, the bitshift operator >> temporarily shifts the number x down by i bits. This is equivalent to dividing by 256^i.

So the resulting number is:

i = 0:   0010001111111011001000000101100010101010000101101011111000000000
i = 8:           00100011111110110010000001011000101010100001011010111110
i = 16:                  001000111111101100100000010110001010101000010110
i = 24:                          0010001111111011001000000101100010101010
i = 32:                                  00100011111110110010000001011000
i = 40:                                          001000111111101100100000
i = 48:                                                  0010001111111011
i = 56:                                                          00100011

By usig & 0xFF, I select the last 8 bits of this number. Example:

x >> 48:           001000111111101100100000
0xff:                              11111111
(x >> 48) & 0xff:  000000000000000000100000

Since the leading zeros do not matter, you have the desired number.

The result is converted to a list and printed in normal and reversed order (like OP wanted it).

Performance

I compared the timing of this result to the other solutions proposed in this thread:

In: timeit list(reversed([(x >> i) & 0xFF for i in range(0,64,8)]))
100000 loops, best of 3: 13.9 µs per loop

In: timeit [(x >> (i * 8)) & 0xFF for i in range(7, -1, -1)]
100000 loops, best of 3: 11.1 µs per loop

In: timeit [(x >> i) & 0xFF for i in range(63,-1,-8)]
100000 loops, best of 3: 10.2 µs per loop

In: timeit reversed(struct.unpack('8B', struct.pack('Q', x)))
100000 loops, best of 3: 3.22 µs per loop

In: timeit reversed(struct.pack('Q', x))
100000 loops, best of 3: 2.07 µs per loop

Result: my solution is not the fastest! Currently, using struct directly (as proposed by Mark Ransom) seems to be the fastest snippet.

Upvotes: 10

Brent Washburne

Reputation: 13158

Here's a version using struct:

import struct
n = 2592701575664680400
bytes = struct.unpack('8B', struct.pack('Q', n))

The bytes are returned in the opposite order that you showed in your question.

Here are the performance stats:

python -m timeit -s "import struct" "struct.unpack('8B', struct.pack('Q', 2592701575664680400))"
1000000 loops, best of 3: 0.33 usec per loop

On my computer, this is three times faster than the byte-shifting solution.

Upvotes: 2

Mark Ransom

Reputation: 308500

In Python 2.x, struct.pack returns a string of bytes. It's easy to convert that to an array of integers.

>>> bytestr = struct.pack('>Q', 2592701575664680400)
>>> bytestr
'#\xfb X\xaa\x16\xbd\xd0'
>>> [ord(b) for b in bytestr]
[35, 251, 32, 88, 170, 22, 189, 208]

The struct module in python is used for converting from python object to byte strings, typically packed according to C structure packing rules. struct.pack takes a format specifier (a string which describes how the bytes of the structure should be laid out), and some python data, and packs it into a byte string. struct.unpack does the inverse, taking a format specifier and a byte string and returning a tuple of unpacked data once again in the format of python objects.

The format specifier being used has two parts. The lead character specifies the endianness (byte order) of the string. The following characters specify the types of the fields of the struct being packed or unpacked. So '>Q' means to pack the given data as a big-endian unsigned long long. To get the bytes in the opposite order, you could use < instead for little-endian.

The final operation is a list comprehension which iterates over the characters of the byte string and uses the ord builtin function to get the integer representation of that character.

Final note: Python doesn't actually have a concept of integer size. In 2.x, there is int which is limited to 32 bits, and long which is of unlimited size. In 3.x those two were unified into a single type. So even though this operation guarantees to give integers that take up only one byte, noting about python will force the resulting integers to stay that way if you use them in other operations.

Upvotes: 9

Padraic Cunningham

Reputation: 180502

bn = "0010001111111011001000000101100010101010000101101011111000000000"

print([int(bn[i:i+8], 2) for i in range(0,len(bn), 8)])
[35, 251, 32, 88, 170, 22, 190, 0]

If you are using the binary representation of n then the output would be different:

n = 2592701575664680373
bn = bin(n)

print([int(bn[i:i+8], 2) for i in range(0,len(bn), 8)])
[35, 251, 32, 88, 170, 22, 189, 181]

Some timings:

In [16]: %%timeit                                                
numbers = list((n >> i) & 0xFF for i in range(0,64,8))
list(reversed(numbers))
   ....: 
100000 loops, best of 3: 2.97 µs per loop

In [17]: timeit [(n >> (i * 8)) & 0xFF for i in range(7, -1, -1)]
1000000 loops, best of 3: 1.73 µs per loop

In [18]: %%timeit                                                
bn = bin(n)
[int(bn[i:i+8], 2) for i in range(0,len(bn), 8)]
   ....: 
100000 loops, best of 3: 3.96 µs per loop

You can also just divmod:

out = []
for _ in range(8):
    n, i = divmod(n, 256)
    out.append(i) 
out = out[::-1]

Which is almost as efficient:

In [31]: %%timeit
   ....: n = 2592701575664680411
   ....: out = []
   ....: for _ in range(8):
   ....:     n, i = divmod(n, 1 << 8)
   ....:     out.append(i)
   ....: out[::-1]
   ....: 
100000 loops, best of 3: 2.35 µs per loop

There is very little advantage in bit shifting with python, I would be more inclined to use whatever you and others find more readable.