Enrique Torres
Enrique Torres

Reputation: 289

Modifying a Numpy float32 bitwise and returning the modified np.float32 value

I'm currently working on a bit chopping algorithm for simulation of memory footprint reduction when training neural networks. I'm using PyTorch to achieve this.

However, what I'm basically trying to do is set to 0 the less significant bits of the mantissa of the float32 value to see for now if the neural network will train and how much precision it will lose depending on the number of bits that are set to 0.

My problem is that every value on the tensors is of type Numpy float32, and I would like to get the literal bit representation of the value (how the actual float32 value is represented in memory) or as an integer, apply the bitwise modification and convert it back to np.float32.

However, I've tried the following (Note, x is of type numpy.ndarray):

    print(x)
    value_as_int = self.x.view(np.int32)
    print(value_as_int)
    value_as_int = value_as_int&0xFFFFFFFE
    print(value_as_int)
    new_float = value_as_int.view(np.float32)
    print(new_float)

Here's an example output of the part that works:

0.13498048
1040857171
1040857170

This does convert the value to its literal bit integer representation and allows me to set to 0 the last bit, although when trying to convert it back to np.float32 I get the following error:

ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged

Is there a proper way to do this? Am I missing something in my code or is it the wrong approach?

Thank you in advance

Upvotes: 2

Views: 650

Answers (1)

jodag
jodag

Reputation: 22214

The problem here is that 0xFFFFFFFE is a python int (rather than a numpy int32). Numpy is implicitly upcasting to int64 when the bit-wise AND operator is appled. To avoid this you can make your bit-mask a np.uint32.

It seems that in Windows (but not Linux), you need to use np.uint32 instead of np.int32 to avoid getting a "Python int too large to convert to C long" error when your bit-mask is larger than 0x7FFFFFFF.

value_as_int = self.x.view(np.uint32)
value_as_int = value_as_int & np.uint32(0xFFFFFFFE)
new_float = value_as_int.view(np.float32)

Upvotes: 1

Related Questions