Save list of numbers to (binary) file with defined bits per number

Question

I have a list/array of numbers, which I want to save to a binary file. The crucial part is, that each number should not be saved as a pre-defined data type. The bits per value are constant for all values in the list but do not correspond to the typical data types (e.g. byte or int).

import numpy as np

# create 10 random numbers in range 0-63
values = np.int32(np.round(np.random.random(10)*63));

# each value requires exactly 6 bits
# how to save this to a file?

# just for debug/information: bit string representation
bitstring = "".join(map(lambda x: str(bin(x)[2:]).zfill(6), values));
print(bitstring)

In the real project, there are more than a million values I want to store with a given bit dephts. I already tried the module bitstring, but appending each value to the BitArray costs a lot of time...

martineau · Accepted Answer

The may be some numpy-specific way that make things easier, but here's a pure Python (2.x) way to do it. It first converts the list of values into a single integer since Python supports int values of any length. Next it converts that int value into a string of bytes and writes it to the file.

Note: If you're sure all the values will fit within the bit-width specified, the array_to_int() function could be sped up slightly by changing the (value & mask) it's using to just value.

import random

def array_to_int(values, bitwidth):
    mask = 2**bitwidth - 1
    shift = bitwidth * (len(values)-1)
    integer = 0
    for value in values:
        integer |= (value & mask) << shift
        shift -= bitwidth
    return integer

# In Python 2.7 int and long don't have the "to_bytes" method found in Python 3.x,
# so here's one way to do the same thing.
def to_bytes(n, length):
    return ('%%0%dx' % (length << 1) % n).decode('hex')[-length:]

BITWIDTH = 6
#values = [random.randint(0, 2**BITWIDTH - 1) for _ in range(10)]
values = [0b000001 for _ in range(10)]  # create fixed pattern for debugging
values[9] = 0b011111  # make last one different so it can be spotted

# just for debug/information: bit string representation
bitstring = "".join(map(lambda x: bin(x)[2:].zfill(BITWIDTH), values));
print(bitstring)

bigint = array_to_int(values, BITWIDTH)
width = BITWIDTH * len(values)
print('{:0{width}b}'.format(bigint, width=width))  # show integer's value in binary

num_bytes = (width+8 - (width % 8)) // 8  # round to whole number of 8-bit bytes
with open('data.bin', 'wb') as file:
    file.write(to_bytes(bigint, num_bytes))

Save list of numbers to (binary) file with defined bits per number

Answers (2)

Related Questions