Sus20200
Sus20200

Reputation: 337

How to restrict the output of Neural Network to be positive in Python, Keras

I use package Keras:

from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([
Dense(100, input_shape=(52,)),
Activation('relu'),
Dense(40),
Activation('softmax'),
Dense(1),
Activation('tanh')
])

 model.compile(optimizer='sgd',
          loss='mean_absolute_error')
 model.fit(train_x2, train_y, epochs=200, batch_size=52)

How can I adjust it such that it considers the output should be zero? I can change it at the end, but I want it to consider this fact while learning.

Upvotes: 4

Views: 10053

Answers (2)

Azmisov
Azmisov

Reputation: 7253

Some various strategies here, and the choice really depends on your use case. They will all have different properties which can affect how the neural network behaves:

  • sigmoid or shifted tanh activation: These will be overly restrictive, within [0,1] not just positive values. In addition to being more restrictive, the gradients towards the high/low values get very small. So if samples get stuck there, it may take forever to get back to the center of sigmoid/tanh. If you are okay with a restricted domain like this, there are many more activation functions you can take a look at and pick from.

  • relu activation: Problem here is that the gradient can be zero when it is below zero. So if samples get stuck here, they won't be learning anymore. However it is very fast to compute and tends to perform well for many problems despite zero gradient in the negative domain.

  • softplus activation: A smooth version of RELU, so it won't ever get stuck in a zero gradient region. However, like sigmoid/tanh, learning will get slower as it gets more negative.

  • exp(output): I found this tends to be a bit less stable (e.g. increases very rapidly < 0). However, when paired with other functions it may still work, such as a downstream log/ln or softmax.

  • square(output): This is a smooth function, with a linear gradient. The disadvantage is the squaring can sometimes cause your values to explode (or vanish if |output| < 1); careful use of normalization must be used to prevent that. This is commonly used for loss functions, like MSE.

  • abs(output): This is linear, so the advantage over square is it doesn't change the magnitude of the value, and learning is constant. It does have a discontinuity in the gradient though, so can perhaps lead to more cliff-like gradient topologies when output is close to zero and the gradient update skips across the discontinuity (gradient clipping may help here).

  • Piecewise[{{.5x^2, |x|<1}, {|x|-.5, |x|>=1}}]: This blends the smoothness of square with the linear abs, so has the advantages of both. The disadvantage is the piecewise conditional makes it slower to compute (though arguably still faster than exp or softplus). I'm not sure if anyone has coined a name for this already, but perhaps it can be called softabs. If you are normalizing your data, x will likely always be < 1; so in that case, you're probably fine just using square. This link has some additional ideas for smooth absolute value functions that might fit your needs better.

    enter image description here

Also one last note: if you just want to make a trained weight parameter positive, just use a weight constraint (abs) instead!

Upvotes: 16

Vikash Singh
Vikash Singh

Reputation: 14001

you can change the activation function to relu => f(x) = max(0, x)

model = Sequential()
model.add(Dense(100, input_shape=(52,), kernel_initializer='normal', activation='relu'))
model.add(Dense(40, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='relu'))
model.compile(loss='mean_absolute_error', optimizer='sgd')

Upvotes: 6

Related Questions