Reputation: 325
I am trying to implement a cost function in theano for a feed forward neural network with multiple hidden layers. The cost function is
cost=((W1*W1).sum()+(b1*b1).sum()+(W2*W2).sum()+(b2*b2).sum())*reg_lambda
However I decide the number of hidden at runtime through a constructor to the network class. So the number of Ws and bs is decided at runtime and hence the expression for cost has to be created at runtime. I can compute the sums of Ws and bs outside the theano function and simply pass the scalar values. But I need the symbolic expression for computing gradients later. How do I make symbolic expression at runtime?
Upvotes: 1
Views: 222
Reputation: 34177
You can use regular Python loops to construct the cost for a dynamic number of layers. Note that Theano 'run time' and Python 'run time' are two different things. Theano's 'compile time' happens during Python's 'run time' so you can use Python code to construct dynamic Theano expressions that depend on parameters known only when the Python code is running.
The cost you give is only L2 regularization of the network parameters. You presumably have additional components for the full cost. Here's a full example.
import numpy
import theano
import theano.tensor as tt
def compile(input_size, hidden_sizes, output_size, reg_lambda, learning_rate):
ws, bs = [], []
x = tt.matrix('x')
x.tag.test_value = numpy.random.standard_normal(size=(2, input_size))\
.astype(theano.config.floatX)
previous_size = input_size
h = x
for hidden_size in hidden_sizes:
w = theano.shared(
numpy.random.standard_normal(size=(previous_size, hidden_size))
.astype(theano.config.floatX))
b = theano.shared(numpy.zeros((hidden_size,), dtype=theano.config.floatX))
h = tt.tanh(tt.dot(h, w) + b)
ws.append(w)
bs.append(b)
previous_size = hidden_size
w = theano.shared(numpy.random.standard_normal(size=(previous_size, output_size))
.astype(theano.config.floatX))
b = theano.shared(numpy.zeros((output_size,), dtype=theano.config.floatX))
y = tt.nnet.softmax(tt.dot(h, w) + b)
ws.append(w)
bs.append(b)
z = tt.ivector('z')
z.tag.test_value = numpy.random.randint(output_size, size=(2,))
cost = tt.nnet.categorical_crossentropy(y, z).mean()
for w, b in zip(ws, bs):
cost += tt.sum(w ** 2) * reg_lambda
cost += tt.sum(b ** 2) * reg_lambda
updates = [(p, p - learning_rate * tt.grad(cost, p)) for p in ws + bs]
return theano.function([x, z], outputs=[cost], updates=updates)
theano.config.compute_test_value = 'raise'
compile(10, [8, 6, 4, 8, 16], 32, 0.1, 0.01)
Note the second for
loop which adds L2 regularization components to the cost
for each of the layers. The number of layers is passed as a parameter to the function.
Upvotes: 1