Ayush
Ayush

Reputation: 31

Loss doesn't decrease in training the pytorch RNN

Here is the RNN network I designed for a sentiment.

class rnn(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.hidden_size = hidden_size
        self.i2h = nn.Linear(input_size, hidden_size)
        self.h2o = nn.Linear(hidden_size, output_size)
        self.h2h = nn.Linear(hidden_size , hidden_size)
        self.relu = nn.Tanh()
        self.sigmoid = nn.LogSigmoid()

    def forward(self, input, hidden):
        hidden_new = self.relu(self.i2h(input)+self.h2h(hidden))
        output = self.h2o(hidden)
        output = self.sigmoid(output)
        return output, hidden_new

    def init_hidden(self):
        return Variable(torch.zeros(1, self.hidden_size))

Then, I create and train the network as :

RNN = rnn(50, 50, 1)
learning_rate = 0.0005
criteria = nn.MSELoss()
optimizer = optim.Adam(RNN.parameters(), lr=learning_rate)
hidden = RNN.init_hidden()
epochs = 2
for epoch in range(epochs):
    for i in range(len(train['Phrase'])):
        input = convert_to_vectors(train['Phrase'][i])
        for j in range(len(input)):
            temp_input = Variable(torch.FloatTensor(input[j]))
            output, hidden = RNN(temp_input, hidden)
        temp_output = torch.FloatTensor([np.float64(train['Sentiment'][i])/4])
        loss = criteria( output, Variable(temp_output))
        loss.backward(retain_graph = True)
        if (i%20 == 0):
            print('Current loss is ', loss)

The problem is that the loss of the network isn't decreasing. It increases, then decreases and so on. It isn't stable at all. I tried using a smaller learning rate but it doesn't seem to help.

Why is this happening and how can I rectify this?

Upvotes: 1

Views: 2184

Answers (2)

dedObed
dedObed

Reputation: 1363

You just need to call optimizer.step() after you do loss.backward().

Which, by the way, illustrates a common misconception: Backpropagation is not a learning algorithm, it's just a cool way of computing the gradient of the loss w.r.t. your parameters. You then use some variant of Gradient Descent (eg. plain SGD, AdaGrad, etc., in your case Adam) to update the weights given the gradients.

Upvotes: 1

chwlsunny
chwlsunny

Reputation: 1

There are some things which I think may give you some help. First, in rnn class module, you'd better use "super(rnn,self).__init__()" to replace "super().__init__()".

Second, the variable name should be consistent with the function, you'd better to use "self.tanh = nn.Tanh()" to replace "self.relu = nn.Tanh()". And in rnn, the sigmoid function should be 1/(1+exp(-x)), instead of the logsigmoid function. You should use "self.sigmoid = nn.Sigmoid()" to replace "self.sigmoid = nn.LogSigmoid()". Third, the output should be activated by the softmax fucntion, if you use the rnn to classify. So you should add two statements,"self.softmax = nn.Softmax()" and "output = self.softmax(output)".

Upvotes: 0

Related Questions