Reputation: 333
I have a list/array of numbers, which I want to save to a binary file. The crucial part is, that each number should not be saved as a pre-defined data type. The bits per value are constant for all values in the list but do not correspond to the typical data types (e.g. byte or int).
import numpy as np
# create 10 random numbers in range 0-63
values = np.int32(np.round(np.random.random(10)*63));
# each value requires exactly 6 bits
# how to save this to a file?
# just for debug/information: bit string representation
bitstring = "".join(map(lambda x: str(bin(x)[2:]).zfill(6), values));
print(bitstring)
In the real project, there are more than a million values I want to store with a given bit dephts. I already tried the module bitstring, but appending each value to the BitArray costs a lot of time...
Upvotes: 0
Views: 1824
Reputation: 123473
The may be some numpy
-specific way that make things easier, but here's a pure Python (2.x) way to do it. It first converts the list of values into a single integer since Python supports int
values of any length. Next it converts that int
value into a string of bytes and writes it to the file.
Note: If you're sure all the values will fit within the bit-width specified, the array_to_int()
function could be sped up slightly by changing the (value & mask)
it's using to just value
.
import random
def array_to_int(values, bitwidth):
mask = 2**bitwidth - 1
shift = bitwidth * (len(values)-1)
integer = 0
for value in values:
integer |= (value & mask) << shift
shift -= bitwidth
return integer
# In Python 2.7 int and long don't have the "to_bytes" method found in Python 3.x,
# so here's one way to do the same thing.
def to_bytes(n, length):
return ('%%0%dx' % (length << 1) % n).decode('hex')[-length:]
BITWIDTH = 6
#values = [random.randint(0, 2**BITWIDTH - 1) for _ in range(10)]
values = [0b000001 for _ in range(10)] # create fixed pattern for debugging
values[9] = 0b011111 # make last one different so it can be spotted
# just for debug/information: bit string representation
bitstring = "".join(map(lambda x: bin(x)[2:].zfill(BITWIDTH), values));
print(bitstring)
bigint = array_to_int(values, BITWIDTH)
width = BITWIDTH * len(values)
print('{:0{width}b}'.format(bigint, width=width)) # show integer's value in binary
num_bytes = (width+8 - (width % 8)) // 8 # round to whole number of 8-bit bytes
with open('data.bin', 'wb') as file:
file.write(to_bytes(bigint, num_bytes))
Upvotes: 1
Reputation: 36
Since you give an example with a string, I'll assume that's how you get the results. This means performance is probably never going to be great. If you can, try creating bytes directly instead of via a string.
Side note: I'm using Python 3 which might require you to make some changes for Python 2. I think this code should work directly in Python 2, but there are some changes around bytearrays and strings between 2 and 3, so make sure to check.
byt = bytearray(len(bitstring)//8 + 1)
for i, b in enumerate(bitstring):
byt[i//8] += (b=='1') << i%8
and for getting the bits back:
bitret = ''
for b in byt:
for i in range(8):
bitret += str((b >> i) & 1)
For millions of bits/bytes you'll want to convert this to a streaming method instead, as you'd need a lot of memory otherwise.
Upvotes: 0