Jaswinder Singh
Jaswinder Singh

Reputation: 33

Pytorch RNN model not learning anything

Task: Predicting whether provided disaster tweets are real or not. Have already converted my textual data into tensors and then into train_loader. All the required code is mentioned below.

My Model Architecture

class RealOrFakeLSTM(nn.Module):
    
    def __init__(self, input_size, output_size, embedding_dim, hidden_dim, n_layers, bidirec, drop_prob):
        super().__init__()
        self.output_size=output_size
        self.n_layers=n_layers
        self.hidden_dim=hidden_dim
        self.bidirec=True;
        self.embedding=nn.Embedding(vocab_size, embedding_dim)
        self.lstm1=nn.LSTM(embedding_dim, hidden_dim, n_layers, dropout=drop_prob, batch_first=True, bidirectional=bidirec)
        #self.lstm2=nn.LSTM(hidden_dim, hidden_dim, n_layers, dropout=drop_prob, batch_first=True)
        self.dropout=nn.Dropout(drop_prob)
        self.fc=nn.Linear(hidden_dim, output_size)
        self.sigmoid=nn.Sigmoid()
        
    def forward(self, x):
        batch=len(x)
        hidden1=self.init_hidden(batch)
        #hidden2=self.init_hidden(batch)
        embedd=self.embedding(x)
        lstm_out1, hidden1=self.lstm1(embedd, hidden1)
        #lstm_out2, hidden2=self.lstm2(lstm_out1, hidden2)
        lstm_out1=lstm_out1.contiguous().view(-1, self.hidden_dim) # make it lstm_out2, if you un comment the other lstm cell.
        out=self.dropout(lstm_out1)
        out=self.fc(out)
        sig_out=self.sigmoid(out)
        sig_out=sig_out.view(batch, -1)
        sig_out=sig_out[:, -1] 
        return sig_out
    
    def init_hidden(self, batch):
        if (train_on_gpu):
          if self.bidirec==True:
            hidden=(torch.zeros(self.n_layers*2, batch, self.hidden_dim).cuda(),torch.zeros(self.n_layers*2, batch, self.hidden_dim).cuda())
          else:
            hidden=(torch.zeros(self.n_layers, batch, self.hidden_dim).cuda(),torch.zeros(self.n_layers, batch, self.hidden_dim).cuda())
        else:
          if self.bidirec==True:
            hidden=(torch.zeros(self.n_layers*2, batch, self.hidden_dim),torch.zeros(self.n_layers*2, batch, self.hidden_dim))
          else:
            hidden=(torch.zeros(self.n_layers, batch, self.hidden_dim),torch.zeros(self.n_layers, batch, self.hidden_dim))
        return hidden

Hyper parameters and training

learning_rate=0.005
epochs=50
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding
output_size = 2
embedding_dim = 300
hidden_dim = 256
n_layers = 2
batch_size=23
net=RealOrFakeLSTM(vocab_size, output_size, embedding_dim, hidden_dim, n_layers, True, 0.3)
net.to(device)
criterion=nn.BCELoss()
optimizer=torch.optim.Adam(net.parameters(),lr=learning_rate)
net.train()
loss_arr=np.array([])
lossPerEpoch=np.array([])
for i in range(epochs):
  total_loss=0;
  for input,label in train_loader:
    if train_on_gpu:
      input=input.to(device)
      label=label.to(device)
    optimizer.zero_grad()
    input=input.clone().detach().long()
    out=net(input)
    loss=criterion(out.squeeze(),label.float())
    loss_arr=np.append(loss_arr,loss.cpu().detach().numpy())
    loss.backward()
    optimizer.step()
    total_loss+=loss
  total_loss=total_loss/len(train_loader)
  lossPerEpoch=np.append(lossPerEpoch,total_loss.cpu().detach().numpy())
  print("Epoch ",i,": ",total_loss)
  torch.save(net.state_dict(), Path+"/RealOrFakeLSTM.pt")
  torch.save(net, Path+"/RealOrFakeLSTM.pth")
current_time=str(time.time())
torch.save(net.state_dict(), Path+"/pt/RealOrFakeLSTM"+'_pt_'+current_time+".pt")
torch.save(net, Path+"/pth/RealOrFakeLSTM"+'_pth_'+current_time+".pth")

The total loss values are all almost same, All the outcomes probabilities in the test dataset are exactly same. I am quite new to this, so hyper parameter tuning, i am kinda going with bruteforce, but nothing seems to work, I think my problem is not with the architecture but with the training part, as all the predictions are exactly same.

Upvotes: 2

Views: 1727

Answers (2)

Gaussian Prior
Gaussian Prior

Reputation: 786

From what I can tell you are initializing the hidden1=self.init_hidden(batch) in every forward pass. That should not be correct. Initializing a layer in every forward pass explains the behavior you described.

Upvotes: 1

Joseph Budin
Joseph Budin

Reputation: 1361

The good news here is : "The total loss values are all almost same," that means they are not always the same, and therefore, I think your network does not output constant probabilities ! I can see many possible reasons why your training does not work as planned. Unfortunately, without debugging myself, I will not be able to say with certainty what happens. So here are my hypothesis :

  • First, the hurtful one : maybe the task is too hard for a neural network. Have you tried classifying them by hand and did you find it easy to do ? There is not easy solution for this except accept that Machine Learning is not a magic wand and cannot solve everything.
  • Maybe your learning rate is too high (or too low) try launching the training for values ranging from 10^-5 to 100 multiplying them by 10 each time. No need to let the training run for too long, just check how much your loss changes from an iteration to another.
  • Maybe your training set is unbalanced : if you have 95% of True inputs and 5% of False ones, then, your network will naturally start by predicting True each time (with logits corresponding to a probability of ~95%). In this case, try to artificially balance it (at least temporarily) : you can do so by duplicating the False examples (ideally not in memory but directly in the code) or by removing some True examples (ideally only in the code also, not in the database).
  • Maybe your architecture is too small (or too big) try adding (or removing) layers. I would start by removing layers since smaller networks tend to learn faster.

Although testing all of these hypothesis may help you, I above all encourage you to understand the outputs of your network, print the outputs of the softmax layer : what probability does it output, and can you guess why ? (Sometimes you just can't, but often times, it is possible, like in the 95/5 probability case I talked about earlier in this answer) Check that the loss is what you expect it to be given this output (compute it manually if need be), in general, be curious to find out how does your code behaves, and check that it works as intended everywhere you can interpret your variables.

It's one of the hard parts of Machine Learning, sailing through it is not easy ;) good luck !

Upvotes: 1

Related Questions