Reputation: 228
Graduate student, new to Keras
and neural networks was trying to fit a very simple feedforward neural network to a one-dimensional sine.
Below are three examples of the best fit that I can get. On the plots, you can see the output of the network vs ground truth
The complete code, just a few lines, is posted here example Keras
I was playing with the number of layers, different activation functions, different initializations, and different loss functions, batch size, number of training samples. It seems that none of those were able to improve the results beyond the above examples.
I would appreciate any comments and suggestions. Is sine a hard function for a neural network to fit? I suspect that the answer is not, so I must be doing something wrong...
There is a similar question here from 5 years ago, but the OP there didn't provide the code and it is still not clear what went wrong or how he was able to resolve this problem.
Upvotes: 3
Views: 4926
Reputation: 57699
Since there is already an answer that provides a workaround I'm going to focus on problems with your approach.
As others have stated, your input data value range from 0 to 1000 is quite big. This problem can be easily solved by scaling your input data to zero mean and unit variance (X = (X - X.mean())/X.std()
) which will result in improved training performance. For tanh
this improvement can be explained by saturation: tanh
maps to [-1;1] and will therefore return either -1 or 1 for almost all sufficiently big (>3) x
, i.e. it saturates. In saturation the gradient for tanh
will be close to zero and nothing will be learned. Of course, you could also use ReLU
instead, which won't saturate for values > 0, however you will have a similar problem as now gradients depend (almost) solely on x
and therefore later inputs will always have higher impact than earlier inputs (among other things).
While re-scaling or normalization may be a solution, another solution would be to treat your input as a categorical input and map your discrete values to a one-hot encoded vector, so instead of
>>> X = np.arange(T)
>>> X.shape
(1000,)
you would have
>>> X = np.eye(len(X))
>>> X.shape
(1000, 1000)
Of course this might not be desirable if you want to learn continuous inputs.
You are currently trying to model a mapping from a linear function to a non-linear function: you map f(x) = x
to g(x) = sin(x)
. While I understand that this is a toy problem, this way of modeling is limited to only this one curve as f(x)
is in no way related to g(x)
. As soon as you are trying to model different curves, say both sin(x)
and cos(x)
, with the same network you will have a problem with your X
as it has exactly the same values for both curves. A better approach of modeling this problem is to predict the next value of the curve, i.e. instead of
X = range(T)
Y = sin(x)
you want
X = sin(X)[:-1]
Y = sin(X)[1:]
so for time-step 2 you will get the y
value of time-step 1 as input and your loss expects the y
value of time-step 2. This way you implicitly model time.
Upvotes: 6
Reputation: 22979
In order to make your code work, you need to:
With these modifications, I was able to run your code with two hidden layers of 10 and 25 neurons
Upvotes: 10