Reputation: 2850
I have a custom bidirectional LSTM model where the custom part is
- extract the forward and backward last hidden state
- concat those states
- create a fully connected layer and pass it through softmax layer.
The code looks like below:
class customModel(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(customModel, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.bilstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=False, bidirectional=True)
self.fcl = nn.Linear(hidden_size, num_classes)
def forward(self, x):
# Set initial hidden and cell states
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
# Forward propagate LSTM
out, hidden = self.bilstm(x, (h0, c0)) # out: tensor of shape (batch_size, seq_length, hidden_size)
#concat hidden state of forward and backword
fw_bilstm = out[-1, :, :self.hidden_size]
bk_bilstm = out[0, :, :self.hidden_size]
concat_fw_bw = torch.cat((fw_bilstm, bk_bilstm), dim = 1)
fc = nn.Linear(concat_fw_bw, num_classes)
x = F.relu(fc(x))
return F.softmax(x)
I use below parameters and input
input_size = 2
hidden_size = 32
num_layers = 1
num_classes = 2
input_embedding = [
torch.FloatTensor([[-0.8264], [0.2524]]),
torch.FloatTensor([[-0.3259], [0.3564]])
]
Then I create a model object
model = customModel(input_size, hidden_size, num_layers, num_classes)
Which then I use like below:
for item in input_embedding:
print(item.size())
for epoch in range(1):
pred = model(item)
print (pred)
When I run it, I see for this line out, hidden = self.bilstm(x, (h0, c0))
, it shows error
RuntimeError: input must have 3 dimensions, got 2
I am not sure why the model is thinking that input must have 3 dimensions when I explicitly specified input_size=2
What am I missing?
Upvotes: 0
Views: 426
Reputation: 904
You seem to be missing a (batch or sequence) dimension in your input.
There is a difference between nn.LSTM
and nn.LSTMCell
. The former -- which is the one you use -- takes whole sequences as inputs. Therefore it needs 3-dimensional inputs of shape (seq_len, batch, input_size).
Let's say you want to give those 4 sequences of letters (which you code as one-hot vectors) as inputs in form of a batch:
x0 = [a,b,c]
x1 = [c,d,e]
x2 = [e,f,g]
x3 = [h,i,j]
### input.size() should give you the following:
(3,4,8)
seq_len
parameter is the size of the sequences: here 3,input_size
parameter is the size of each input vector: here, the input would be a one-hot vector of size 8,batch
is the number of sequences you put together: here there are 4 sequences.NB: It can be easier to grasp by putting the batch sequence first and setting the batch_first
as True
Also: if (h_0, c_0) is not provided, both h_0 and c_0 default to zero so it's not useful to create them.
Upvotes: 1