Reputation: 33
Task: Predicting whether provided disaster tweets are real or not. Have already converted my textual data into tensors and then into train_loader. All the required code is mentioned below.
My Model Architecture
class RealOrFakeLSTM(nn.Module):
def __init__(self, input_size, output_size, embedding_dim, hidden_dim, n_layers, bidirec, drop_prob):
super().__init__()
self.output_size=output_size
self.n_layers=n_layers
self.hidden_dim=hidden_dim
self.bidirec=True;
self.embedding=nn.Embedding(vocab_size, embedding_dim)
self.lstm1=nn.LSTM(embedding_dim, hidden_dim, n_layers, dropout=drop_prob, batch_first=True, bidirectional=bidirec)
#self.lstm2=nn.LSTM(hidden_dim, hidden_dim, n_layers, dropout=drop_prob, batch_first=True)
self.dropout=nn.Dropout(drop_prob)
self.fc=nn.Linear(hidden_dim, output_size)
self.sigmoid=nn.Sigmoid()
def forward(self, x):
batch=len(x)
hidden1=self.init_hidden(batch)
#hidden2=self.init_hidden(batch)
embedd=self.embedding(x)
lstm_out1, hidden1=self.lstm1(embedd, hidden1)
#lstm_out2, hidden2=self.lstm2(lstm_out1, hidden2)
lstm_out1=lstm_out1.contiguous().view(-1, self.hidden_dim) # make it lstm_out2, if you un comment the other lstm cell.
out=self.dropout(lstm_out1)
out=self.fc(out)
sig_out=self.sigmoid(out)
sig_out=sig_out.view(batch, -1)
sig_out=sig_out[:, -1]
return sig_out
def init_hidden(self, batch):
if (train_on_gpu):
if self.bidirec==True:
hidden=(torch.zeros(self.n_layers*2, batch, self.hidden_dim).cuda(),torch.zeros(self.n_layers*2, batch, self.hidden_dim).cuda())
else:
hidden=(torch.zeros(self.n_layers, batch, self.hidden_dim).cuda(),torch.zeros(self.n_layers, batch, self.hidden_dim).cuda())
else:
if self.bidirec==True:
hidden=(torch.zeros(self.n_layers*2, batch, self.hidden_dim),torch.zeros(self.n_layers*2, batch, self.hidden_dim))
else:
hidden=(torch.zeros(self.n_layers, batch, self.hidden_dim),torch.zeros(self.n_layers, batch, self.hidden_dim))
return hidden
Hyper parameters and training
learning_rate=0.005
epochs=50
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding
output_size = 2
embedding_dim = 300
hidden_dim = 256
n_layers = 2
batch_size=23
net=RealOrFakeLSTM(vocab_size, output_size, embedding_dim, hidden_dim, n_layers, True, 0.3)
net.to(device)
criterion=nn.BCELoss()
optimizer=torch.optim.Adam(net.parameters(),lr=learning_rate)
net.train()
loss_arr=np.array([])
lossPerEpoch=np.array([])
for i in range(epochs):
total_loss=0;
for input,label in train_loader:
if train_on_gpu:
input=input.to(device)
label=label.to(device)
optimizer.zero_grad()
input=input.clone().detach().long()
out=net(input)
loss=criterion(out.squeeze(),label.float())
loss_arr=np.append(loss_arr,loss.cpu().detach().numpy())
loss.backward()
optimizer.step()
total_loss+=loss
total_loss=total_loss/len(train_loader)
lossPerEpoch=np.append(lossPerEpoch,total_loss.cpu().detach().numpy())
print("Epoch ",i,": ",total_loss)
torch.save(net.state_dict(), Path+"/RealOrFakeLSTM.pt")
torch.save(net, Path+"/RealOrFakeLSTM.pth")
current_time=str(time.time())
torch.save(net.state_dict(), Path+"/pt/RealOrFakeLSTM"+'_pt_'+current_time+".pt")
torch.save(net, Path+"/pth/RealOrFakeLSTM"+'_pth_'+current_time+".pth")
The total loss values are all almost same, All the outcomes probabilities in the test dataset are exactly same. I am quite new to this, so hyper parameter tuning, i am kinda going with bruteforce, but nothing seems to work, I think my problem is not with the architecture but with the training part, as all the predictions are exactly same.
Upvotes: 2
Views: 1727
Reputation: 786
From what I can tell you are initializing the hidden1=self.init_hidden(batch) in every forward pass. That should not be correct. Initializing a layer in every forward pass explains the behavior you described.
Upvotes: 1
Reputation: 1361
The good news here is : "The total loss values are all almost same," that means they are not always the same, and therefore, I think your network does not output constant probabilities ! I can see many possible reasons why your training does not work as planned. Unfortunately, without debugging myself, I will not be able to say with certainty what happens. So here are my hypothesis :
True
inputs and 5% of False
ones, then, your network will naturally start by predicting True
each time (with logits corresponding to a probability of ~95%). In this case, try to artificially balance it (at least temporarily) : you can do so by duplicating the False
examples (ideally not in memory but directly in the code) or by removing some True
examples (ideally only in the code also, not in the database).Although testing all of these hypothesis may help you, I above all encourage you to understand the outputs of your network, print the outputs of the softmax layer : what probability does it output, and can you guess why ? (Sometimes you just can't, but often times, it is possible, like in the 95/5 probability case I talked about earlier in this answer) Check that the loss is what you expect it to be given this output (compute it manually if need be), in general, be curious to find out how does your code behaves, and check that it works as intended everywhere you can interpret your variables.
It's one of the hard parts of Machine Learning, sailing through it is not easy ;) good luck !
Upvotes: 1