Reputation: 3743
first using nn.Parameter
class ModelOne(nn.Module):
def __init__(self):
super().__init__()
self.weights = nn.Parameter(torch.randn(300, 10))
self.bias = nn.Parameter(torch.zeros(10))
def forward(self, x):
return x @ self.weights + self.bias
when I do
mo = ModelOne()
[len(param) for param in mo.parameters()]
it gives [300, 10]
second using nn.Linear
class ModelTwo(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(300, 10)
def forward(self, x):
return self.linear(x)
same thing here gives [10, 10]
Upvotes: 2
Views: 187
Reputation: 12867
The difference lies in how nn.Linear
initializes weights and bias:
class Linear(Module):
def __init__(self, in_features, out_features, bias=True):
super(Linear, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.weight = Parameter(torch.Tensor(out_features, in_features))
if bias:
self.bias = Parameter(torch.Tensor(out_features))
...
So, when you write nn.Linear(300, 10)
the weight is (10, 300) and bias is (10). But, in the ModelOne, weights has dimension (300, 10).
You can confirm it using
for name, param in mo.named_parameters():
print(name, param.shape)
The output in ModelOne:
weights torch.Size([300, 10])
bias torch.Size([10])
In ModelTwo:
linear.weight torch.Size([10, 300])
linear.bias torch.Size([10])
Now, the reason you're getting [300, 10] in first case and [10, 10] in second case is because if you print the length of a 2d Tensor, then it'll only give its first dimension i.e.
a = torch.Tensor(10, 300)
b = torch.Tensor(10)
print(len(a), len(b))
(10, 10)
Upvotes: 3