python2.7 - store boolean values as individual bits when writing to disk

Question

I'm writing code that converts integers into padded 8-bit strings. I would then like to write those strings to a binary file. I am having problems figuring out the proper dtype to be used with the numpy array that I am currently using.

In the following code when I have bin_data variable set up with dtype=np.int8 the output is:

$ python bool_dtype.py 
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 1, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
[0 0 0 0 1 0 0 0 0]
16

When bin_data is set as dtype=np.bool_ the output is always true as shown below:

$ python bool_dtype.py 
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 1, bool(a[j]) = True
a[j] = 1, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 0, bool(a[j]) = True
a[j] = 1, bool(a[j]) = True
a[j] = 1, bool(a[j]) = True
[ True  True  True  True  True  True  True  True  True]
16

When I look at the xxd dump of the data when using the dtype=np.int8 I see an expected byte being used to represent each bit (1,0) IE 00000001 or 00000000. Using dtype=np.bool_ leads to the same problem.

So the two main questions I have are

Why is bool always reading as True when reading an array element
How can I more efficiently store the data when I write it to the file such that a single bit is not stored as a byte but instead just concatenated onto the previous element?

Here is the code in question, Thanks!

#!/usr/bin/python2.7

import numpy as np
import os

# x = np.zeros(200,dtype=np.bool_)
# for i in range(0,len(x)):
#     if i%2 != 1:
#         x[i] = 1

data_size = 2
data = np.random.randint(0,9,data_size)
tx=''
for i in range(0,data_size):
    tx += chr(data[i])
data = tx
a = np.zeros(8,dtype=np.int8)
bin_data = np.zeros(len(data)*8,dtype=np.bool_)

# each i is a character byte in data string
for i in range(0,len(data)):
    # formats data in 8bit binary without the 0b prefix
    a = format(ord(data[i]),'b').zfill(8)
    for j in range(0,len(a)):
        bin_data[i*len(a) + j] = a[j]
        print("a[j] = {}, bool(a[j]) = {}").format(a[j], bool(a[j]))

print bin_data[1:10]
print len(bin_data)

path = os.getcwd()
path = path + '/bool_data.bin'
data_file = open(path, "wb")
data_file.write(bin_data)
data_file.close()

edit:

What I expect to see when using dtype=np.bool_

>>> import numpy as np
>>> a = np.zeros(2,dtype=np.bool_)
>>> a
array([False, False], dtype=bool)
>>> a[1] = 1
>>> a
array([False,  True], dtype=bool)

bpachev · Accepted Answer

The reason that bool is always returning true is that a[j] is a nonempty string. You need to cast a[j] to an int before testing with bool (and also before assigning it as an entry to a numpy bool array).
You can just call numpy.packbits to compress your boolean array into a uint8 array, (it pads for you if needed) and then call numpy.unpackbits to reverse the operation.

Edit:

If your boolean array has a length that isn't a multiple of 8, after packing and unpacking your array will be zero-padded to make the length a multiple of 8. In this case, you have two options:

If you can safely truncate your data to have a number of bits that is divisible by 8, then do so. Something like: data=data[:8*(len(data)/8)]
If you can't afford to truncate, then you are going to record the number of meaningful bits somehow. I suggest making the first byte of your packed data equal to the number of meaningful bits mod 8. This will add only one byte of memory overhead, and not much compute time. Something like:

Packing

bool_data = np.array([True, True, True])
nbits = len(bool_data)
rem = nbits % 8
nbytes = nbits/8
if rem: nbytes += 1
data = np.empty(1+nbytes, dtype=np.uint8)
data[0] = rem
data[1:] = np.packbits(bool_data)

Unpacking

rem = data[0]
bool_data = np.unpackbits(data[1:])
if rem:
  bool_data = bool_data[:-(8-rem)]

python2.7 - store boolean values as individual bits when writing to disk

So the two main questions I have are

edit:

Answers (1)

Packing

Unpacking

Related Questions