Cardano
Cardano

Reputation: 951

How to extract the bits of larger numeric Numpy data types

Numpy has a library function, np.unpackbits, which will unpack a uint8 into a bit vector of length 8. Is there a correspondingly fast way to unpack larger numeric types? E.g. uint16 or uint32. I am working on a question that involves frequent translation between numbers, for array indexing, and their bit vector representations, and the bottleneck is our pack and unpack functions.

Upvotes: 26

Views: 14268

Answers (4)

Ross
Ross

Reputation: 637

This is the solution I use:

def unpackbits(x, num_bits):
    if np.issubdtype(x.dtype, np.floating):
        raise ValueError("numpy data type needs to be int-like")
    xshape = list(x.shape)
    x = x.reshape([-1, 1])
    mask = 2**np.arange(num_bits, dtype=x.dtype).reshape([1, num_bits])
    return (x & mask).astype(bool).astype(int).reshape(xshape + [num_bits])

This is a completely vectorized solution that works with any dimension ndarray and can unpack however many bits you want.

Upvotes: 21

user6597793
user6597793

Reputation: 1

This works for arbitrary arrays of arbitrary uint (i.e. also for multidimensional arrays and also for numbers larger than the uint8 max value).

It cycles over the number of bits, rather than over the number of array elements, so it is reasonably fast.

def my_ManyParallel_uint2bits(in_intAr,Nbits):
    ''' convert (numpyarray of uint => array of Nbits bits) for many bits in parallel'''
    inSize_T= in_intAr.shape
    in_intAr_flat=in_intAr.flatten()
    out_NbitAr= numpy.zeros((len(in_intAr_flat),Nbits))
    for iBits in xrange(Nbits):
        out_NbitAr[:,iBits]= (in_intAr_flat>>iBits)&1
    out_NbitAr= out_NbitAr.reshape(inSize_T+(Nbits,))
    return out_NbitAr  

A=numpy.arange(256,261).astype('uint16')
# array([256, 257, 258, 259, 260], dtype=uint16)
B=my_ManyParallel_uint2bits(A,16).astype('uint16')
# array([[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#       [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#       [0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#       [1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#       [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]], dtype=uint16)

Upvotes: 0

Phillip Cloud
Phillip Cloud

Reputation: 25672

You can do this with view and unpackbits

Input:

unpackbits(arange(2, dtype=uint16).view(uint8))

Output:

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]

For a = arange(int(1e6), dtype=uint16) this is pretty fast at around 7 ms on my machine

%%timeit
unpackbits(a.view(uint8))

100 loops, best of 3: 7.03 ms per loop

As for endianness, you'll have to look at http://docs.scipy.org/doc/numpy/user/basics.byteswapping.html and apply the suggestions there depending on your needs.

Upvotes: 26

Roman Susi
Roman Susi

Reputation: 4199

I have not found any function for this too, but maybe using Python's builtin struct.unpack can help make the custom function faster than shifting and anding longer uint (note that I am using uint64).

>>> import struct
>>> N = np.uint64(2 + 2**10 + 2**18 + 2**26)
>>> struct.unpack('>BBBBBBBB', N)
(2, 4, 4, 4, 0, 0, 0, 0)

The idea is to convert those to uint8, use unpackbits, concatenate the result. Or, depending on your application, it may be more convenient to use structured arrays.

There is also built-in bin() function, which produces string of 0s and 1s, but I am not sure how fast it is and it requires postprocessing too.

Upvotes: 3

Related Questions