Reputation: 179
I'm new to theano and trying to use the examples convolutional network and denoising autoencoder to make a denoising convolutional network. I am currently struggling with how to make W', the reverse weights. In this paper they use tied weights for W' that are flipped in both dimensions.
I'm currently working on a 1d signal, so my image shape is (batch_size, 1, 1, 1000) and filter/W size is (num_kernels, 1, 1, 10) for example. The output of the convolution is then (batch_size, num_kernels, 1, 991). Since I want to W' to be just the flipped in 2 dimensions (or 1d in my case), I'm tempted to do this
w_value = numpy_rng.uniform(low=-W_bound, high=W_bound, size=filter_shape)
self.W = theano.shared(np.asarray((w_value), dtype=theano.config.floatX), borrow=True)
self.W_prime = T.repeat(self.W[:, :, :, ::-1], num_kernels, axis=1)
where I reverse flip it in the relevant dimension and repeat those weights so that they are the same dimension as the feature maps from the hidden layer.
With this setup, do I only have to get the gradients for W to update or should W_prime also be a part of the grad computation?
When I do it like this, the MSE drops a lot after the first minibatch and then stops changing. Using cross entropy gives NaN from the first iteration. I don't know if that is related to this issue or if it's one of many other potential bugs I have in my code.
Upvotes: 2
Views: 387
Reputation: 34177
I can't comment on the validity of your W_prime
approach but I can say that you only need to compute the gradient of the cost with respect to each of the original shared variables. Your W_prime
is a symbolic function of W
, not a shared variable itself so you don't need to compute gradients with respect to W_prime
.
Whenever you get NaNs, the first thing to try is to reduce the size of the learning rate.
Upvotes: 1