Farhang Amaji
Farhang Amaji

Reputation: 973

difference in code between using nn.RNN or not

hi im new to rnn's and I found RNN NLP FROM SCRATCH from pytorch official tutorials, and I think it's named "from scartch" because it didn't use the nn.RNN built in nn in pytorch some line like this self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True) in the def __init__(self, input_size, hidden_size, output_size): segment. so how to the code would have been evolved if the nn.RNN was been used?

class RNN(nn.Module):
    # implement RNN from scratch rather than using nn.RNN
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        
        self.hidden_size = hidden_size
        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        self.i2o = nn.Linear(input_size + hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)
        
    def forward(self, input_tensor, hidden_tensor):
        combined = torch.cat((input_tensor, hidden_tensor), 1)
        
        hidden = self.i2h(combined)
        output = self.i2o(combined)
        output = self.softmax(output)
        return output, hidden
    
    def init_hidden(self):
        return torch.zeros(1, self.hidden_size)
def train(line_tensor, category_tensor):
    hidden = rnn.init_hidden()
    
    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)
        
    loss = criterion(output, category_tensor)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    return output, loss.item()

another equivalent to this question is how to rewrite the code with using self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True) or if it's not possible how internal nn.RNN structure look like?

Upvotes: 1

Views: 384

Answers (1)

Sudhanshu
Sudhanshu

Reputation: 732

This model is referring the implementation of RNN before autograde module introduce, it is a pure implementation of RNN. In this example hidden state and gradient entirely handled by graph.

def init_hidden(self):
        return torch.zeros(1, self.hidden_size)

the line above initializes the hidden state(which is zeros at first). and after first step we get the output and next hidden state which later feed in the next step.

All this process handle by graph.

Upvotes: 1

Related Questions