EdoG
EdoG

Reputation: 355

How to sweep a neural-network through an image with tensorflow?

My question is about finding an efficient (mostly in term of parameters count) way to implement a sliding window in tensorflow (1.4) in order to apply a neural network through the image and produce a 2-d map with each pixel (or region) representing the network output for the corresponding receptive field (which in this case is the sliding window itself).

In practice, I'm trying to implement either a MTANN or a PatchGAN using tensorflow, but I cannot understand the implementation I found.

The two architectures can be briefly described as:

While I cannot find any tensorflow implementation of MTANN, I found the PatchGAN implementation, which is considered as a convolutional network, but I couldn't figure out how to implement this in practice.

Let's say I got a pre-trained network of which I got the output tensor. I understand that convolution is the way to go, since a convolutional layer operates over a local region of the input and what is I'm trying to do can be clearly represented as a convolutional network. However, what if I already have the network that generates the sub-maps from a given window of fixed-size?

E.g. I got a tensor

sub_map = network(input_patch)

which returns a [1,2,2,1] maps from a [1,8,8,3] image (corresponding to a 3-layer FCN with input size 8, filter size 3x3). How can I sweep this network on [1,64,64,3] images, in order to produce a [1,64,64,1] map composed of each spatial contribution, like it happens in a convolution?

I've considered these solutions:

The ultimate goal is to apply this small network to big images (eg. 1024x1024) in an efficient way, since my current model downscales progressively the images and wouldn't fit in memory due to the huge number of parameters.

Can anyone help me to get a better understanding of what I am missing?

Thank you

Upvotes: 2

Views: 280

Answers (1)

EdoG
EdoG

Reputation: 355

I found an interesting video by Andrew Ng exactly on how to implement a sliding window using a convolutional layer. The problem here was that I was thinking at the number of layers as a variable that is dependent on a fixed input/output shape, while it should be the opposite.

In principle, a saved model should only contain the learned filters for each level and as long as the filter shapes are compatible with the layers' input/output depth. Thus, applying a different (ie. bigger) spatial resolution to the network input produces a different output shape, which can be seen as an application of the neural network to a sliding windows sweeping across the input image.

Upvotes: 0

Related Questions