them
them

Reputation: 228

Neural network toy model to fit sine function fails, what's wrong?

Graduate student, new to Keras and neural networks was trying to fit a very simple feedforward neural network to a one-dimensional sine.

Below are three examples of the best fit that I can get. On the plots, you can see the output of the network vs ground truth

Neural network output vs Ground truth (run 3)

Neural network output vs Ground truth (run 1)
Neural network output vs Ground truth (run 2)

The complete code, just a few lines, is posted here example Keras


I was playing with the number of layers, different activation functions, different initializations, and different loss functions, batch size, number of training samples. It seems that none of those were able to improve the results beyond the above examples.

I would appreciate any comments and suggestions. Is sine a hard function for a neural network to fit? I suspect that the answer is not, so I must be doing something wrong...


There is a similar question here from 5 years ago, but the OP there didn't provide the code and it is still not clear what went wrong or how he was able to resolve this problem.

Upvotes: 3

Views: 4926

Answers (2)

nemo
nemo

Reputation: 57699

Since there is already an answer that provides a workaround I'm going to focus on problems with your approach.

Input data scale

As others have stated, your input data value range from 0 to 1000 is quite big. This problem can be easily solved by scaling your input data to zero mean and unit variance (X = (X - X.mean())/X.std()) which will result in improved training performance. For tanh this improvement can be explained by saturation: tanh maps to [-1;1] and will therefore return either -1 or 1 for almost all sufficiently big (>3) x, i.e. it saturates. In saturation the gradient for tanh will be close to zero and nothing will be learned. Of course, you could also use ReLU instead, which won't saturate for values > 0, however you will have a similar problem as now gradients depend (almost) solely on x and therefore later inputs will always have higher impact than earlier inputs (among other things).

While re-scaling or normalization may be a solution, another solution would be to treat your input as a categorical input and map your discrete values to a one-hot encoded vector, so instead of

>>> X = np.arange(T)
>>> X.shape
(1000,)

you would have

>>> X = np.eye(len(X))
>>> X.shape
(1000, 1000)

Of course this might not be desirable if you want to learn continuous inputs.

Modeling

You are currently trying to model a mapping from a linear function to a non-linear function: you map f(x) = x to g(x) = sin(x). While I understand that this is a toy problem, this way of modeling is limited to only this one curve as f(x) is in no way related to g(x). As soon as you are trying to model different curves, say both sin(x) and cos(x), with the same network you will have a problem with your X as it has exactly the same values for both curves. A better approach of modeling this problem is to predict the next value of the curve, i.e. instead of

X = range(T)
Y = sin(x)

you want

X = sin(X)[:-1]
Y = sin(X)[1:]

so for time-step 2 you will get the y value of time-step 1 as input and your loss expects the y value of time-step 2. This way you implicitly model time.

Upvotes: 6

BlackBear
BlackBear

Reputation: 22979

In order to make your code work, you need to:

  • scale the input values in the [-1, +1] range (neural networks don't like big values)
  • scale the output values as well, as the tanh activation doesn't work too well close to +/-1
  • use the relu activation instead of tanh in all but the last layer (converges way faster)

With these modifications, I was able to run your code with two hidden layers of 10 and 25 neurons

Upvotes: 10

Related Questions