Olfway
Olfway

Reputation: 61

Why does Tensorflow cast int32/int32 to float64 and how to stop it?

I am dividing a tensor of type int32 by a tensor of type int32, and the result is float64. I can't find answers as to why this happens or if there are implicit rules behind how Tensorflow does this. I have not explicitly defined a dtype for any tensor, but I have checked all of them, and none of them have a 64bit type until after the division.

I've tried using different formulations of division such as tf.divide, all give the same result.

My code looks like:

a_cdf = a / tf.size(a)

with a being of type tf.int32.

What I want to get is the result as float32, so I can write my function without an explicit cast.

Upvotes: 5

Views: 2572

Answers (1)

javidcf
javidcf

Reputation: 59681

This is by design. "True" division in TensorFlow (that is, real division) uses a _TRUEDIV_TABLE that specifies the casting rules for each type, and it currently reads:

# Conversion table for __truediv__.  None entries mean no conversion required.
_TRUEDIV_TABLE = {
    dtypes.uint8: dtypes.float32,
    dtypes.int8: dtypes.float32,
    dtypes.uint16: dtypes.float32,
    dtypes.int16: dtypes.float32,
    dtypes.int32: dtypes.float64,
    dtypes.int64: dtypes.float64,
    dtypes.bfloat16: None,
    dtypes.float16: None,
    dtypes.float32: None,
    dtypes.float64: None,
    dtypes.complex64: None,
    dtypes.complex128: None,
}

Meaning that int32 tensors will be converted to float64. If you want to obtain a float32 as output, either use a smaller int type or cast your inputs to float32.

The rationale for this is another matter. If I had to guess, on the one hand I'd say if you are using 8 or 16 bit integers you are probably concerned about memory, so a smaller result type would make sense. But also, you could give the following argument:

import numpy as np

# Compute smallest positive divisions with 16 and 32 bits
smallest_16bit_fraction = 1 / ((1 << 16) - 1)
smallest_32bit_fraction = 1 / (-(1 << 31))  # 31 bits because int32 is signed
# Compute one plus the smallest fractions with 32 and 64 bit floats
print(np.float32(1) + np.float32(smallest_16bit_fraction))
# 1.0000153
print(np.float64(1) + np.float64(smallest_16bit_fraction))
# 1.0000152590218967
print(np.float32(1) + np.float32(smallest_32bit_fraction))
# 1.0
print(np.float64(1) + np.float64(smallest_32bit_fraction))
# 0.9999999995343387

So you could think that, being a division of two integer values, you may want to mix the result with an integer, but as you can see for 32 bit integers, there will be cases where a 32 bit float will underflow.

But again, this is just guessing and more of a thought exercise than anything else.

Upvotes: 2

Related Questions