Why does a custom activation function cause network both zero loss and low accuracy?

Question

I was trying to build a custom activation function using tflearn by making following changes:

add my custom activation function to activation.py

def my_activation(x):
    return tf.where(x >= 0.0, tf.div( x**2 , x + tf.constant(0.6)) , 0.01*x)

and add it to the __init__.py

from .activations import linear, tanh, sigmoid, softmax, softplus, softsign,\
relu, relu6, leaky_relu, prelu, elu, crelu, selu, my_activation

Since tensorflow can perform the gradient calculation automatically, I don't need to implement the gradiate function. As pointed in the article Deep Learning Programming Style,

In the past, whenever someone defined a new model, they had to work out the derivative calculations by hand. While the math is reasonably straightforward, for complex models, it can be time-consuming and tedious work. All modern deep learning libraries make the practitioner/researcher’s job much easier, by automatically solving the problem of gradient calculation.

I trained the model on cifar10 dataset using this code: https://github.com/tflearn/tflearn/blob/master/examples/images/convnet_cifar10.py but changed all relu activations to my_activation.

Sadly, this simple modification cause the network fail to learn anything:

Training Step: 46  | total loss: 0.00002 | time: 1.434s
| Adam | epoch: 001 | loss: 0.00002 - acc: 0.0885 -- iter: 04416/50000
Training Step: 47  | total loss: 0.00002 | time: 1.448s
| Adam | epoch: 001 | loss: 0.00002 - acc: 0.0945 -- iter: 04512/50000
Training Step: 48  | total loss: 0.00001 | time: 1.462s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0927 -- iter: 04608/50000
Training Step: 49  | total loss: 0.00001 | time: 1.476s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0896 -- iter: 04704/50000
Training Step: 50  | total loss: 0.00001 | time: 1.491s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0919 -- iter: 04800/50000
Training Step: 51  | total loss: 0.00001 | time: 1.504s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0890 -- iter: 04896/50000
Training Step: 52  | total loss: 0.00001 | time: 1.518s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0944 -- iter: 04992/50000
Training Step: 53  | total loss: 0.00001 | time: 1.539s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0989 -- iter: 05088/50000
Training Step: 54  | total loss: 0.00001 | time: 1.553s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0951 -- iter: 05184/50000
Training Step: 55  | total loss: 0.00000 | time: 1.567s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0964 -- iter: 05280/50000
Training Step: 56  | total loss: 0.00000 | time: 1.580s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0931 -- iter: 05376/50000
Training Step: 57  | total loss: 0.00000 | time: 1.594s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0903 -- iter: 05472/50000
Training Step: 58  | total loss: 0.00000 | time: 1.613s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0851 -- iter: 05568/50000
Training Step: 59  | total loss: 0.00000 | time: 1.641s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0835 -- iter: 05664/50000
Training Step: 60  | total loss: 0.00000 | time: 1.674s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0834 -- iter: 05760/50000

Since I am just a beginner, I don't know the reason causing the network became both zero loss and low accuracy (NaN output? Deadweight?). Can anybody tell me how to fix this? thanks!

Please Note that I'm not asking how to build a custom activation function. Questions about how to make a custom function:

Why does a custom activation function cause network both zero loss and low accuracy?

Answers (1)

Related Questions