Reputation: 427
I'm attempting to extract the weights and biases from a simple network built in PyTorch. My entire network is composed of nn.Linear layers. When I create a layer by calling nn.Linear(in_dim, out_dim)
, I expect the parameters that I get from calling model.parameters()
for that model to be of shape (in_dim, out_dim)
for the weight and (out_dim)
for the bias. However, the weights that come out of model.parameters()
are instead of shape (out_dim, in_dim)
.
The intention of my code is to be able to use matrix multiplication to perform a forward pass using only numpy, not any PyTorch. Because of the shape inconsistency, matrix multiplications throw an error. How can I fix this?
Here is my exact code:
class RNN(nn.Module):
def __init__(self, dim_input, dim_recurrent, dim_output):
super(RNN, self).__init__()
self.dim_input = dim_input
self.dim_recurrent = dim_recurrent
self.dim_output = dim_output
self.dense1 = nn.Linear(self.dim_input, self.dim_recurrent)
self.dense2 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)
self.dense3 = nn.Linear(self.dim_input, self.dim_recurrent)
self.dense4 = nn.Linear(self.dim_recurrent, self.dim_recurrent, bias = False)
self.dense5 = nn.Linear(self.dim_recurrent, self.dim_output)
#There is a defined forward pass
model = RNN(12, 100, 6)
for i in model.parameters():
print(i.shape())
The output is:
torch.Size([100, 12])
torch.Size([100])
torch.Size([100, 100])
torch.Size([100, 12])
torch.Size([100])
torch.Size([100, 100])
torch.Size([6, 100])
torch.Size([6])
The output should, if I'm correct, be:
torch.Size([12, 100])
torch.Size([100])
torch.Size([100, 100])
torch.Size([12, 100])
torch.Size([100])
torch.Size([100, 100])
torch.Size([100, 6])
torch.Size([6])
What is my issue?
Upvotes: 2
Views: 3860
Reputation: 24099
What you see there is not the (out_dim, in_dim), it is just the shape of the weight matrix. When you call print(model)
you can see that input and output features are correct:
RNN(
(dense1): Linear(in_features=12, out_features=100, bias=True)
(dense2): Linear(in_features=100, out_features=100, bias=False)
(dense3): Linear(in_features=12, out_features=100, bias=True)
(dense4): Linear(in_features=100, out_features=100, bias=False)
(dense5): Linear(in_features=100, out_features=6, bias=True)
)
You can check the source code to see that the weights are actually transposed before calling matmul
.
nn.Linear
is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear
You can check the forward
, it looks like this:
def forward(self, input):
return F.linear(input, self.weight, self.bias)
F.linear
is define here:
https://pytorch.org/docs/stable/_modules/torch/nn/functional.html
The respective line for multiplying the weights is:
output = input.matmul(weight.t())
As mentioned above you can see that the weights are transposed before applying matmul
and therefore the shape of the weights is different than you expected.
So if you want to do the matrix multiplication manually, you do:
# dummy input of length 5
input = torch.rand(5, 12)
# apply layer dense1 (without bias, for bias just add + model.dense1.bias)
output_first_layer = input.matmul(model.dense1.weight.t())
print(output_first_layer.shape)
Just as you would expect from your dense1
it returns:
torch.Size([5, 100])
I hope this explains your observations with the shape :)
Upvotes: 3