MoneyBall
MoneyBall

Reputation: 2563

pytorch lstm tutorial initializing Variable

I am going through the pytorch tutorial for lstm and here's the code they use:

lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
inputs = [autograd.Variable(torch.randn((1, 3)))
          for _ in range(5)]  # make a sequence of length 5

# initialize the hidden state.
hidden = (autograd.Variable(torch.randn(1, 1, 3)),
          autograd.Variable(torch.randn((1, 1, 3))))
for i in inputs:
    # Step through the sequence one element at a time.
    # after each step, hidden contains the hidden state.
    out, hidden = lstm(i.view(1, 1, -1), hidden)

For variable hidden, it initializes into a tuple and the result is:

(Variable containing:
(0 ,.,.) = 
  0.4251 -1.2328 -0.6195
[torch.FloatTensor of size 1x1x3]
, Variable containing:
(0 ,.,.) = 
  1.5133  1.9954 -0.6585
[torch.FloatTensor of size 1x1x3]
)

What I don't understand is

  1. Is (0, ., .) an index? And shouldn't it initialize all three numbers since we said (torch.randn(1,1,3))?

  2. What is the difference between torch.randn(1, 1, 3) and torch.randn((1,1,3))?

Upvotes: 0

Views: 1473

Answers (1)

Dair
Dair

Reputation: 16240

First to quickly answer number 2: They are identical. I don't know why they would do them differently.

Next, to answer question 1:

hidden is a tuple that contains two Variables that are essentially a 1 x 1 x 3 tensor.

Let's focus on what (0 ,.,.). If instead of a 1 x 1 x 3 tensor you had a 2 x 2 tensor, you could simply print out something like:

0.1 0.2
0.3 0.4

But it's kind of hard to represent 3 dimensional things on the screen. Even though it's a bit silly, having the aditional 1 at the beginning changes what would otherwise be a 2 dimensional tensor into a 3 dimensional one. So, instead Pytorch prints out "slices" of the tensor. In this case, you only have one "slice" which happens to be the zeroith slice. Thus you get the additional (0, ,.,.) instead of it just printing out

  0.4251 -1.2328 -0.6195

If instead the dimensions were 2 x 1 x 3 you could expect an output like:

(0 ,.,.) = 
 -0.3027 -1.1077  0.4724

(1 ,.,.) = 
  1.0063 -0.5936 -1.1589
[torch.FloatTensor of size 2x1x3]

And as you can see every element in the tensor is actually initialized.

Upvotes: 1

Related Questions