jb4earth
jb4earth

Reputation: 188

PyTorch CNN never converges (implementation issue suspected)

I am having trouble getting this network to work as desired. I have tried so many iterations of this model and yet cannot get a reasonable error (it never fits, can’t even get it to overfit).

Where have I gone wrong? Any help would be greatly appreciated

For reference, there are 12 input ‘images’ (they’re actually water surface elevation at 9 stations in an estuary) of shape 49,9 and 12 labels of shape 1,9.

Full examples with data can be found at https://gitlab.com/jb4earth/effonn/

  net = []
  class Net(torch.nn.Module):
      def __init__(self, kernel_size):
          super(Net, self).__init__()
          mid_size = (49*49*9)
          self.predict = torch.nn.Sequential(
              nn.Conv2d(
                          in_channels=1,
                          out_channels=mid_size,
                          kernel_size=kernel_size,
                          stride=1,
                          padding=(0, 0)
                      ),
              nn.ReLU(),
              nn.MaxPool2d(1),
              nn.ReLU(),
              nn.Conv2d(
                          in_channels=mid_size,
                          out_channels=1,
                          kernel_size=kernel_size,
                          stride=1,
                          padding=(0, 0)
                      ),
              nn.ReLU()
          )


      def forward(self, x):
          x = self.predict(x)
          return x

  def train_network(x,y,optimizer,loss_func):
      prediction = net(x)    
      loss = loss_func(prediction, y.squeeze())     
      optimizer.zero_grad()  
      loss.backward()     
      optimizer.step()    
      return prediction, loss


  net = Net((1,1))
  optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
  loss_func = torch.nn.MSELoss()
  cnt = 0
  t = True
  while t == True:
      # get_xy in place of DataLoader
      (x,y) = get_xy(input_data,output_data,cnt)
      # x.shape is 1,1,49,9
      # y.shape is 1,1,1,9

      # train and predict
      (prediction,loss) = train_network(x,y,optimizer,loss_func)

      # prediction shape different than desired so averaging all results
      prediction_ = torch.mean(prediction)

      # only 12 IO's so loop through 
      cnt += 1
      if cnt > 11:
          cnt = 0

Upvotes: 0

Views: 66

Answers (1)

basilisk
basilisk

Reputation: 1277

take a look here, this looks suspicious. you are calculating the loss and then making the gradients zeros. calling zero grad should be called before calculating the loss. So you need to switch the optimizer.zero_grad() to the top and I assume it will work. I couldn't reproduce your example that's why I'm guessing this is your Error.

  loss = loss_func(prediction, y.squeeze())     
  optimizer.zero_grad()   # switch this to the top  
  loss.backward()     
  optimizer.step() 

Upvotes: 1

Related Questions