Reputation: 3743

What is the difference between these two neural network structures?

first using nn.Parameter

class ModelOne(nn.Module):
  def __init__(self):
    super().__init__()
    self.weights = nn.Parameter(torch.randn(300, 10))
    self.bias = nn.Parameter(torch.zeros(10))
  def forward(self, x):
    return x @ self.weights + self.bias

when I do

mo = ModelOne()
[len(param) for param in mo.parameters()]

it gives [300, 10]

second using nn.Linear

class ModelTwo(nn.Module):
  def __init__(self):
    super().__init__()
    self.linear = nn.Linear(300, 10)

  def forward(self, x):
    return self.linear(x)

same thing here gives [10, 10]

Upvotes: 2

Answers (1)

Harshit Kumar

Reputation: 12867

The difference lies in how nn.Linear initializes weights and bias:

class Linear(Module):

    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
        ...

So, when you write nn.Linear(300, 10) the weight is (10, 300) and bias is (10). But, in the ModelOne, weights has dimension (300, 10).

You can confirm it using

for name, param in mo.named_parameters():
    print(name, param.shape)

The output in ModelOne:

weights torch.Size([300, 10])
bias torch.Size([10])

In ModelTwo:

linear.weight torch.Size([10, 300])
linear.bias torch.Size([10])

Now, the reason you're getting [300, 10] in first case and [10, 10] in second case is because if you print the length of a 2d Tensor, then it'll only give its first dimension i.e.

a = torch.Tensor(10, 300)
b = torch.Tensor(10)
print(len(a), len(b))

(10, 10)

Upvotes: 3

What is the difference between these two neural network structures?

Answers (1)

Related Questions