Locus
Locus

Reputation: 179

Deconvolutional autoencoder in theano

I'm new to theano and trying to use the examples convolutional network and denoising autoencoder to make a denoising convolutional network. I am currently struggling with how to make W', the reverse weights. In this paper they use tied weights for W' that are flipped in both dimensions.

I'm currently working on a 1d signal, so my image shape is (batch_size, 1, 1, 1000) and filter/W size is (num_kernels, 1, 1, 10) for example. The output of the convolution is then (batch_size, num_kernels, 1, 991). Since I want to W' to be just the flipped in 2 dimensions (or 1d in my case), I'm tempted to do this

w_value = numpy_rng.uniform(low=-W_bound, high=W_bound, size=filter_shape)
self.W = theano.shared(np.asarray((w_value), dtype=theano.config.floatX), borrow=True)
self.W_prime = T.repeat(self.W[:, :, :, ::-1], num_kernels, axis=1)

where I reverse flip it in the relevant dimension and repeat those weights so that they are the same dimension as the feature maps from the hidden layer.

With this setup, do I only have to get the gradients for W to update or should W_prime also be a part of the grad computation?

When I do it like this, the MSE drops a lot after the first minibatch and then stops changing. Using cross entropy gives NaN from the first iteration. I don't know if that is related to this issue or if it's one of many other potential bugs I have in my code.

Upvotes: 2

Views: 387

Answers (1)

Daniel Renshaw
Daniel Renshaw

Reputation: 34177

I can't comment on the validity of your W_prime approach but I can say that you only need to compute the gradient of the cost with respect to each of the original shared variables. Your W_prime is a symbolic function of W, not a shared variable itself so you don't need to compute gradients with respect to W_prime.

Whenever you get NaNs, the first thing to try is to reduce the size of the learning rate.

Upvotes: 1

Related Questions