Combined GRU and CNN network always returns the same value for all inputs

Question

I am trying to train a combined CNN and GRU/LSTM to find out the number of objetcs in a series of pictures that move and the number of objects that do not move. For this reason I am using a CNN to process my images and consequently use a GRU. My problem is that the GRU always returns the same value for each input set. What could be reasons for that?

I have already tried to use different learning rates and adding linear layers after the GRU.

My network:


    class GRU(nn.Module):
        def __init__(self, **kwargs):
            super(GRU, self).__init__()
            self.n_class = int(kwargs.get("n_class"))
            self.seq_length = int(kwargs.get("seq_length"))
            self.input_shape = int(kwargs.get("input_shape"))
            self.n_channels = int(kwargs.get("n_channels"))
            self.conv1 = nn.Conv2d(in_channels=1 * seq_length, out_channels=4 * seq_length, kernel_size=5)
            self.conv2 = nn.Conv2d(in_channels=4 * seq_length, out_channels=8 * seq_length, kernel_size=5)
            self.conv3 = nn.Conv2d(in_channels=8 * seq_length, out_channels=16 * seq_length, kernel_size=5)
            self.rnn = nn.GRU(
                input_size=self.seq_length,
                hidden_size=64,
                num_layers=1,
                batch_first=True)
            self.linear = nn.Linear(64, 2)

        def forward(self, t):
            t = self.conv1(t)
            t = F.relu(t) 
            t = F.max_pool2d(t, kernel_size=2, stride=2)
            # second conv layer
            t = self.conv2(t)
            t = F.relu(t)
            t = F.max_pool2d(t, kernel_size=4, stride=4)
            # third conv layer
            t = self.conv3(t)
            t = F.relu(t)
            t = F.max_pool2d(t, kernel_size=3, stride=3)
            t = t.reshape(-1 , self.seq_length, 16 * 20 ** 2)
            t = t.permute(0,2,1)
            t, (h_n) =self.rnn(t)
            t = self.linear(t[:,-1])
            return t

and this is my training:

for epoch in range(number_epochs):
    for batch in get_batch_generator(batch_size, rootdir, seq_length=seq_length):
        current_batch = batch[0].cuda()
        current_labels = batch[1].cuda()
        pre = nw(current_batch)
        loss_func = torch.nn.MSELoss()
        loss = loss_func(pre, current_labels)
        loss.backward()
        optimizer = optim.Adam(nw.parameters(), lr=learning_rate)
        optimizer.step()

Here is an example of the output, actual labels:

tensor([[ 4.,  5.],
        [10.,  0.],
        [10.,  0.],
        [ 2.,  9.],
        [ 5.,  1.],
        [10.,  0.]], device='cuda:0')

Prediction of my network:

tensor([[2.0280, 1.1517],
        [2.0175, 1.1593],
        [2.0323, 1.1434],
        [2.0333, 1.1557],
        [2.0200, 1.1546],
        [2.0069, 1.1687]], device='cuda:0', grad_fn=)

So for both classes the output is the same for both classes (moving and not moving objects), which should not be the case.

Combined GRU and CNN network always returns the same value for all inputs

Answers (1)

Related Questions