Reputation: 131
I'm trying lasagne and the nolearn NeuralNet function to approximate a simple sin
function. After all, neural nets are proven to be universal approximators, so I wanted to try lasagne on a simple non-linear function to show that fact experimentally. This is the code:
import lasagne
import numpy as np
from lasagne import layers
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet
import matplotlib.pylab as pylab
x=np.linspace(0,1,1000)
y=np.sin(8*x)
# Fit the dimensions and scale
x=x.reshape(1000,1).astype(np.float32)
y=y.astype(np.float32)
y=(y-np.min(y))/np.max(y)
We get the following function:
pylab.plot(x,y)
pylab.show()
Now we create a simple neural net with 100 hidden units to approximate the function:
net= NeuralNet(
layers=[
('input', layers.InputLayer),
('hidden', layers.DenseLayer),
('output', layers.DenseLayer),
],
input_shape=(None,1),
hidden_num_units=100,
hidden_nonlinearity=lasagne.nonlinearities.rectify,
hidden_W=lasagne.init.GlorotUniform(),
output_num_units=1,
output_nonlinearity=None,
output_W=lasagne.init.GlorotUniform(),
update=nesterov_momentum,
update_learning_rate=0.001,
update_momentum=0.9,
regression=True,
max_epochs=500,
verbose=0,
)
net=net.fit(x,y)
Now we predict the x
values with the trained net to see what we get:
yp=net.predict(x)
pylab.plot(x,yp,hold=1)
pylab.plot(x,y)
pylab.show()
And this is what we get! Approximated function. It's ridiculous! And if we increment the number of hidden neurons or the training epochs nothing changes. Other types of nonlinearities just make it worse. In theory this should work much better, what am I missing?
Thank you very much.
Upvotes: 1
Views: 140
Reputation: 131
I finally know what was happening. I post my guess in case anyone comes by the same problem.
As it is known, NeuralNet from nolearn environment uses batch training. I don't exactly know how it chooses the batches, but it seems to me that it chooses them sequentially. Then, if the data is not randomized, a batch would not be statistically representative of the whole (the data is not stationary). In my case, I made x=np.linspace(0,1,1000)
, and thus the statistical properties of each batch would be different because there's a natural order.
If you create the data randomly instead, i.e., x=np.random.uniform(size=[1000,1])
, each batch would be statistically representative, independently of where it is taken from. Once you do this, you can increase the epoch of training and improve convergence to the true optima. I don't know if my guess is correct, but at least it worked for me. Nevertheless I will dig more into it.
Upvotes: 2