kuonb
kuonb

Reputation: 208

Machine learning: why the cost function does not need to be derivable?

I was playing around with Tensorflow creating a customized loss function and this question about general machine learning arose to my head.

My understanding is that the optimization algorithm needs a derivable cost function to find/approach a minimum, however we can use functions that are non-derivable such as the absolute function (there is no derivative when x=0). A more extreme example, I defined my cost function like this:

def customLossFun(x,y):
    return tf.sign(x)

and I expected an error when running the code, but it actually worked (it didn't learn anything but it didn't crash).

Am I missing something?

Upvotes: 0

Views: 1163

Answers (4)

Joshua R.
Joshua R.

Reputation: 2302

In order to prevent TensorFlow from throwing an error, the only real requirement is that you cost function evaluates to a number for any value of your input variables. From a purely "will it run" perspective, it doesn't know/care about the form of the function its trying to minimize.

In order for your cost function to provide you a meaningful result when TensorFlow uses it to train a model, it additionally needs to 1) get smaller as your model does better and 2) be bounded from below (i.e. it can't go to negative infinity). It's not generally necessary for it to be smooth (e.g. abs(x) has a kink where the sign flips). Tensorflow is always able to compute gradients at any location using automatic differentiation (https://en.wikipedia.org/wiki/Automatic_differentiation, https://www.tensorflow.org/versions/r0.12/api_docs/python/train/gradient_computation).

Of course, those gradients are of more use if you've chose a meaningful cost function isn't isn't too flat.

Upvotes: 1

Andre Holzner
Andre Holzner

Reputation: 18675

If it didn't learn anything, what have you gained ? Your loss function is differentiable almost everywhere but it is flat almost anywhere so the minimizer can't figure out the direction towards the minimum.

If you start out with a positive value, it will most likely be stuck at a random value on the positive side even though the minima on the left side are better (have a lower value).

Tensorflow can be used to do calculations in general and it provides a mechanism to automatically find the derivative of a given expression and can do so across different compute platforms (CPU, GPU) and distributed over multiple GPUs and servers if needed.

But what you implement in Tensorflow does not necessarily have to be a goal function to be minimized. You could use it e.g. to throw random numbers and perform Monte Carlo integration of a given function.

Upvotes: 0

Maxim
Maxim

Reputation: 53758

Ideally, the cost function needs to be smooth everywhere to apply gradient based optimization methods (SGD, Momentum, Adam, etc). But nothing's going to crash if it's not, you can just have issues with convergence to a local minimum.

When the function is non-differentiable at a certain point x, it's possible to get large oscillations if the neural network converges to this x. E.g., if the loss function is tf.abs(x), it's possible that the network weights are mostly positive, so the inference x > 0 at all times, so the network won't notice tf.abs. However, it's more likely that x will bounce around 0, so that the gradient is arbitrarily positive and negative. If the learning rate is not decaying, the optimization won't converge to the local minimum, but will bound around it.

In your particular case, the gradient is zero all the time, so nothing's going to change at all.

Upvotes: 0

nessuno
nessuno

Reputation: 27052

You're missing the fact that the gradient of the sign function is somewhere manually defined in the Tensorflow source code.

As you can see here:

def _SignGrad(op, _):
  """Returns 0."""
  x = op.inputs[0]
  return array_ops.zeros(array_ops.shape(x), dtype=x.dtype)

the gradient of tf.sign is defined to be always zero. This, of course, is the gradient where the derivate exists, hence everywhere but not in zero.

The tensorflow authors decided to do not check if the input is zero and throw an exception in that specific case

Upvotes: 1

Related Questions