Alexander
Alexander

Reputation: 333

Save list of numbers to (binary) file with defined bits per number

I have a list/array of numbers, which I want to save to a binary file. The crucial part is, that each number should not be saved as a pre-defined data type. The bits per value are constant for all values in the list but do not correspond to the typical data types (e.g. byte or int).

import numpy as np

# create 10 random numbers in range 0-63
values = np.int32(np.round(np.random.random(10)*63));

# each value requires exactly 6 bits
# how to save this to a file?

# just for debug/information: bit string representation
bitstring = "".join(map(lambda x: str(bin(x)[2:]).zfill(6), values));
print(bitstring)

In the real project, there are more than a million values I want to store with a given bit dephts. I already tried the module bitstring, but appending each value to the BitArray costs a lot of time...

Upvotes: 0

Views: 1824

Answers (2)

martineau
martineau

Reputation: 123473

The may be some numpy-specific way that make things easier, but here's a pure Python (2.x) way to do it. It first converts the list of values into a single integer since Python supports int values of any length. Next it converts that int value into a string of bytes and writes it to the file.

Note: If you're sure all the values will fit within the bit-width specified, the array_to_int() function could be sped up slightly by changing the (value & mask) it's using to just value.

import random

def array_to_int(values, bitwidth):
    mask = 2**bitwidth - 1
    shift = bitwidth * (len(values)-1)
    integer = 0
    for value in values:
        integer |= (value & mask) << shift
        shift -= bitwidth
    return integer

# In Python 2.7 int and long don't have the "to_bytes" method found in Python 3.x,
# so here's one way to do the same thing.
def to_bytes(n, length):
    return ('%%0%dx' % (length << 1) % n).decode('hex')[-length:]

BITWIDTH = 6
#values = [random.randint(0, 2**BITWIDTH - 1) for _ in range(10)]
values = [0b000001 for _ in range(10)]  # create fixed pattern for debugging
values[9] = 0b011111  # make last one different so it can be spotted

# just for debug/information: bit string representation
bitstring = "".join(map(lambda x: bin(x)[2:].zfill(BITWIDTH), values));
print(bitstring)

bigint = array_to_int(values, BITWIDTH)
width = BITWIDTH * len(values)
print('{:0{width}b}'.format(bigint, width=width))  # show integer's value in binary

num_bytes = (width+8 - (width % 8)) // 8  # round to whole number of 8-bit bytes
with open('data.bin', 'wb') as file:
    file.write(to_bytes(bigint, num_bytes))

Upvotes: 1

Invibsid
Invibsid

Reputation: 36

Since you give an example with a string, I'll assume that's how you get the results. This means performance is probably never going to be great. If you can, try creating bytes directly instead of via a string.

Side note: I'm using Python 3 which might require you to make some changes for Python 2. I think this code should work directly in Python 2, but there are some changes around bytearrays and strings between 2 and 3, so make sure to check.

byt = bytearray(len(bitstring)//8 + 1)
for i, b in enumerate(bitstring):
    byt[i//8] += (b=='1') << i%8

and for getting the bits back:

bitret = ''
for b in byt:
    for i in range(8):
        bitret += str((b >> i) & 1)

For millions of bits/bytes you'll want to convert this to a streaming method instead, as you'd need a lot of memory otherwise.

Upvotes: 0

Related Questions