LSTM/RNN in pytorch The relation between forward method and training model

Question

I'm still fairly new to neural networks, so sorry on beforehand for any ambiguities to the following.

In a "standard" LSTM implementation for language task, we have the following (sorry for the very rough sketches):

class LSTM(nn.Module):
    def __init__(*args):
    ...

    def forward(self, input, states):
         
        lstn_in = self.model['embed'](input)
        lstm_out, hidden = self.model['lstm'](lstm_in,states)

        return lstm_out, hidden

Later on, we call upon this model in the training step:

def train(*args):
      
    for epoch in range(epochs):
        ....
        *init_zero_states
        ...
        out, states = model(input, states)
        ...
    return model

Let's just say, that I have 3 sentences as input:

sents = [[The, sun, is, shiny],
 [The, beach, was, very, windy],
 [Computer, broke, down, today]]

model = train(LSTM, sents)

All words in all sentences gets converted to embeddings and loaded into the model.

Now the question:

Does the self.model['lstm'] iterate though all words from all articles and makes one output after every word? or every sentence?
How does the model make distinction between the 3 sentences, such as after getting "The", "sun", "is", "shiny", does something (such as the states) in the 'lstm' reset and begin anew?
The "out" in the train step after out, states = model(input, states) is the output after running all 3 sentences and hence the combined "information" from all 3 sentences?

Thanks!

Theodor Peifer · Accepted Answer

when using LSTMs in Pytorch you usually use the nn.LSTM function. Here is a quick example and then an explanation what happens inside:

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        
        self.embedder = nn.Embedding(voab_size, embed_size)

        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.embedder(x)
        
        # every time you pass a new sentence into the model you need to create
        # a new hidden-state (the LSTM requires, unlike RNNs, two hidden-states in a tuple)

        hidden = (torch.zeros(num_layers, batch_size, hidden_size), torch.zeros(num_layers, batch_size, hidden_size))
        x, hidden = self.lstm(x, hidden)
        
        # x contains the output states of every timestep, 
        # for classifiction we mostly just want the last one
        x = x[:, -1]

        x = self.fc(x)
        x = self.softmax(x)
        return x

So, when taking a look at the nn.LSTM function, you see all N embedded words are passed into it at once and you get as output all N outputs (one from every timestep). That means inside of the lstm function, it iterates over all words in the sentence embeddings. We just dont see that in the code. It also returns the hiddenstate of every timestep but you dont have to use that further. In most cases you can just ignore that.

As pseudo code:

def lstm(x):
    hiddenstates = init_with_zeros()
    outputs, hiddenstates = [], []
    for e in x:
        output, hiddenstate = neuralnet(e, hiddenstate)
    
        outputs.append(output)
        hiddenstates.append(hiddenstate)

    return outputs, hiddenstates

sentence = ["the", "sun", "is", "shiny"]
sentence = embedding(sentence)

outputs, hiddenstates = lstm(sentence)

LSTM/RNN in pytorch The relation between forward method and training model

Answers (1)

Related Questions