Reputation: 821
I'm trying to make a simple multivariate linear Regression with Lasagne. This is my Input:
x_train = np.array([[37.93, 139.5, 329., 16.64,
16.81, 16.57, 1., 707.,
39.72, 149.25, 352.25, 16.61,
16.91, 16.60, 40.11, 151.5,
361.75, 16.95, 16.98, 16.79]]).astype(np.float32)
y_train = np.array([37.92, 138.25, 324.66, 16.28, 16.27, 16.28]).astype(np.float32)
For this two data points the network should be able to learn y
perfectly.
Here is the model:
i1 = T.matrix()
y = T.vector()
lay1 = lasagne.layers.InputLayer(shape=(None,20),input_var=i1)
out1 = lasagne.layers.get_output(lay1)
lay2 = lasagne.layers.DenseLayer(lay1, 6, nonlinearity=lasagne.nonlinearities.linear)
out2 = lasagne.layers.get_output(lay2)
params = lasagne.layers.get_all_params(lay2, trainable=True)
cost = T.sum(lasagne.objectives.squared_error(out2, y))
grad = T.grad(cost, params)
updates = lasagne.updates.sgd(grad, params, learning_rate=0.1)
f_train = theano.function([i1, y], [out1, out2, cost], updates=updates)
After executing multiple times
f_train(x_train,y_train)
the cost explodes to infinity. Any idea what is going wrong here?
Thanks!
Upvotes: 1
Views: 1541
Reputation: 34177
The network has too much capacity for a single training instance. You would need to apply some strong regularization to prevent the training diverging. Alternatively, and hopefully more realistically, give it more complex training data (many instances).
With a single instance the task can be solved using just one input, instead of 20, and with the DenseLayer
's bias disabled:
import numpy as np
import theano
import lasagne
import theano.tensor as T
def compile():
x, z = T.matrices('x', 'z')
lh = lasagne.layers.InputLayer(shape=(None, 1), input_var=x)
ly = lasagne.layers.DenseLayer(lh, 6, nonlinearity=lasagne.nonlinearities.linear,
b=None)
y = lasagne.layers.get_output(ly)
params = lasagne.layers.get_all_params(ly, trainable=True)
cost = T.sum(lasagne.objectives.squared_error(y, z))
updates = lasagne.updates.sgd(cost, params, learning_rate=0.0001)
return theano.function([x, z], [y, cost], updates=updates)
def main():
f_train = compile()
x_train = np.array([[37.93]]).astype(theano.config.floatX)
y_train = np.array([[37.92, 138.25, 324.66, 16.28, 16.27, 16.28]])\
.astype(theano.config.floatX)
for _ in xrange(100):
print f_train(x_train, y_train)
main()
Note that the learning rate also needs to be reduced a lot to prevent divergence.
Upvotes: 0