Innat
Innat

Reputation: 17239

TypeError: Using Custom Activation Function while Mixed Precision Enabled?

I was trying to use a custom activation in mixed-precision enabled training pipelines but faced the following error:

TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type float16 of argument 'x'.

Reproduce

Enabling Mixed precision...

import tensorflow as tf 

policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16')
tf.keras.mixed_precision.experimental.set_policy(policy)
print('Mixed precision enabled')

Custom activation...

def ARelu(x, alpha=0.90, beta=2.0):
    alpha = tf.clip_by_value(alpha, clip_value_min=0.01, clip_value_max=0.99)
    beta  = 1 + tf.math.sigmoid(beta)
    return tf.nn.relu(x) * beta - tf.nn.relu(-x) * alpha

Training...

import tensorflow as tf

(xtrain, ytrain), (xtest, ytest) = tf.keras.datasets.mnist.load_data()

def pre_process(inputs, targets):
    inputs  = tf.expand_dims(inputs, -1)
    targets = tf.one_hot(targets, depth=10)
    return tf.divide(inputs, 255), targets

train_data = tf.data.Dataset.from_tensor_slices((xtrain, ytrain)).\
    take(10_000).shuffle(10_000).batch(8).map(pre_process)
test_data = tf.data.Dataset.from_tensor_slices((xtest, ytest)).\
    take(1_000).shuffle(1_000).batch(8).map(pre_process)

model = tf.keras.Sequential([
                             
            tf.keras.layers.Conv2D(filters=16, kernel_size=(3, 3), strides=(1, 1),
                                   input_shape=(28, 28, 1), activation=ARelu),
            tf.keras.layers.MaxPool2D(pool_size=(2, 2)),

            tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), strides=(1, 1), 
                                   activation=ARelu),
            tf.keras.layers.MaxPool2D(pool_size=(2, 2)),

            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(64, activation=ARelu), 
            tf.keras.layers.Dense(10, activation='softmax', dtype=tf.float32)]) 

opt = tf.keras.optimizers.Adam()

model.compile(loss='categorical_crossentropy', optimizer=opt)
history = model.fit(train_data, validation_data=test_data, epochs=10)

# ------------------

TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type float16 of argument 'x'.

However, without mixed-precision, it works. I understand the problem simply types miss match but where I should look into it?

Additionally, while trying to solve it, I've found that using tf.keras.mixed_precision.LossScaleOptimizer is safe to avoid numeric underflow. Is it something that we should use for mixed-precision training?

Upvotes: 0

Views: 475

Answers (2)

George El Haber
George El Haber

Reputation: 66

The solution of the above problem is casting your defined alpha and beta into float16 rather than casting the input of your activation layer to Float32.

DETAILS:

In reality, the main reason for using MP is to reduce the memory footprint observed during training. The method for doing so is by storing the output of the layer in a FP16, since memory consumption is dominated by the storage of activations rather than weights. By recasting your layer output to FP32 in the custom activation function, you are losing these savings and even requiring more memory to train the model compared to using Full precision because 2 copies exist for your activation.

Upvotes: 2

Innat
Innat

Reputation: 17239

To solve this, I've to cast the input to float32. I'm not sure though whether it's the right way to solve this error.

def ARelu(x, alpha=0.90, beta=2.0):
    alpha = tf.clip_by_value(alpha, clip_value_min=0.01, clip_value_max=0.99)
    beta  = 1 + tf.math.sigmoid(beta)
    x = tf.cast(x, 'float32')
    return tf.nn.relu(x) * beta - tf.nn.relu(-x) * alpha

Simply by just typecasting to float32, it works.

Details

However, the fact is, in order to leverage mixed-precision, we have to do as follows:

# At the beginning ....
policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16')
tf.keras.mixed_precision.experimental.set_policy(policy)
print('Mixed precision enabled')

and

# at the last layer, adding dtype as float 32
tf.keras.layers.Dense(num_classes, activation=..., dtype=tf.float32)])

Honestly, until now I don't know how the mixed-precision mechanism works behind. At first, it sets policy mixed_float16 and cast output activation as tf.float32. By this, we couldn't use the custom activation function unless typecasting the input x to float32 which I believe was float16 for mixed precision.

Upvotes: 2

Related Questions