Reputation: 3135
Please excuse the novice question, but is Module
just the same as saying model
?
That's what it sounds like, when the documentation says:
Whenever you want a model more complex than a simple sequence of existing Modules you will need to define your model (as a custom
Module
subclass).
Or... when they mention Module
, are they referring to something more formal and computer-sciency, like a protocol / interface type thing?
Upvotes: 30
Views: 13667
Reputation: 3009
The exact definition of a 'Module' in PyTorch can be found in https://pytorch.org/docs/stable/notes/modules.html,
PyTorch uses modules to represent neural networks. Modules are:
- Building blocks of stateful computation. PyTorch provides a robust library of modules and makes it simple to define new custom modules, allowing for easy construction of elaborate, multi-layer neural networks.
- Tightly integrated with PyTorch’s autograd system. Modules make it simple to specify learnable parameters for PyTorch’s Optimizers to update.
- Easy to work with and transform. Modules are straightforward to save and restore, transfer between CPU / GPU / TPU devices, prune, quantize, and more.
They also provided an example of a custom module. In the code below, MyLinear
is a module.
import torch
from torch import nn
class MyLinear(nn.Module):
def __init__(self, in_features, out_features):
super().__init__()
self.weight = nn.Parameter(torch.randn(in_features, out_features))
self.bias = nn.Parameter(torch.randn(out_features))
def forward(self, input):
return (input @ self.weight) + self.bias
PyTorch's definition of a module should not be confused with Python's definition of a module.
Python defines a module as:
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended.
Usually MyLinear
is not a file. It is a class, not a module, according to Python's definition. For more details, see Difference between Module and Class in Python. However, in PyTorch's terminology, it is considered a module.
Upvotes: 0
Reputation: 4513
It's a simple container.
From the docs of nn.Module
Base class for all neural network modules. Your models should also subclass this class. Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes. Submodules assigned in this way will be registered, and will have their parameters converted too when you call
.cuda()
, etc.
From the tutorial:
All network components should inherit from nn.Module and override the forward() method. That is about it, as far as the boilerplate is concerned. Inheriting from nn.Module provides functionality to your component. For example, it makes it keep track of its trainable parameters, you can swap it between CPU and GPU with the .to(device) method, where device can be a CPU device torch.device("cpu") or CUDA device torch.device("cuda:0").
A module is a container from which layers, model subparts (e.g. BasicBlock
in resnet
in torchvision
) and models should inherit. Why should they? Because the inheritance from nn.Module
allows you to call methods like to("cuda:0")
, .eval()
, .parameters()
or register hooks easily.
That's an API design choice and I find having only a Module
class instead of two separate Model
and Layers
to be cleaner and to allow more freedom (it's easier to send just a part of the model to GPU, to get parameters only for some layers...).
Upvotes: 27
Reputation: 68
why not just call the 'module' a model, and call the layers 'layers'?
Recall in data structure course, you define binary tree like this
class tree:
def __init__(self, value, left, right):
self.value = value
self.left = left
self.right = right
you can add sub tree or leaf to a tree to form a new tree, just like you can add sub module to module to form a new module(you don't want to sub tree and tree two different data structure, you don't want leaf and tree two different data structure, because after all they are all tree, you want to use module to represent both model and layers ... think it as recursive, it is a API design choice to make things clean.)
What exactly is the definition of a 'Module' in PyTorch?
I would like to think module as something takes input and output something, just like a function... that's what forward method in module class do(specify what the function is), and you need to overwrite default forward method because otherwise pytorch would not know what the function is...
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4*4*50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
Another example is nn.sequential, it is also a module, but a special one, it takes a list a of module and chains the input and output of these modules together.
nn.sequential(a, b c) # a->b->c
that's why you do not need to specify a forward method, because it is specified implicitly(just take output of a former module and feed to next module).
Another example is conv2d, it is also a module, and its forward method is also defined already so you don't need to specify it...
class _ConvNd(Module):
# omit
class Conv2d(_ConvNd):
def __init__(self, in_channels, out_channels, kernel_size, stride=1,
padding=0, dilation=1, groups=1,
bias=True, padding_mode='zeros'):
kernel_size = _pair(kernel_size)
stride = _pair(stride)
padding = _pair(padding)
dilation = _pair(dilation)
super(Conv2d, self).__init__(
in_channels, out_channels, kernel_size, stride, padding, dilation,
False, _pair(0), groups, bias, padding_mode)
def conv2d_forward(self, input, weight):
if self.padding_mode == 'circular':
expanded_padding = ((self.padding[1] + 1) // 2, self.padding[1] // 2,
(self.padding[0] + 1) // 2, self.padding[0] // 2)
return F.conv2d(F.pad(input, expanded_padding, mode='circular'),
weight, self.bias, self.stride,
_pair(0), self.dilation, self.groups)
return F.conv2d(input, weight, self.bias, self.stride,
self.padding, self.dilation, self.groups)
def forward(self, input):
return self.conv2d_forward(input, self.weight)
also if anyone wonder how pytorch builds a graph and do back propagation...
check this out... (plz do not take this code seriously since I am not sure if this is how pytorch implement... but take the idea with you, it may help you understand how pytorch works)
some silly code Hope this helps :)
PS, I am new to deep learning and pytorch. It's likely this may contain some mistakes, read carefully...
Upvotes: 0
Reputation: 46479
why not just call the 'module' a model, and call the layers 'layers'?
This is by inheritance, since PyTorch inherited Torch originally written in Lua, and in there they called it module.
What exactly is the definition of a 'Module' in PyTorch?
There are different kinds of definitions in general.
Here is one pragmatic:
This one is structural:
module.parameters()
.This one is functional:
module.zero_grad()
to set gradients of all parameters inside to zero. This is something we should do after every backprop step. This shows module also has to deal with backprop which is the step when parameters marked for update will be updated.Module parameters marked for update have requires_grad=True
like this:
Parameter containing:
tensor([-0.4411, -0.2094, -0.5322, -0.0154, -0.1009], requires_grad=True)
You can say parameters are just like tensors except they have an attribute requires_grad
where you can decide should they update during backprop or no.
Finally, back to forward
step to get an important note:
class ZebraNet(nn.Module):
def __init__(self, num_classes=1000):
super(self).__init__()
self.convpart = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.avgpooling = nn.AdaptiveAvgPool2d((6, 6))
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.convpart(x)
x = self.avgpooling(x)
x = x.view(x.size(0), 256 * 6 * 6)
x = self.classifier(x)
return x
You see how the structure is set in __init__
and how forward()
will tell you what will happen with the input x
and what will be returned. This return value will have the dimension of the output we need. Based on how precise we are predicting the output we have worse or better accuracy, which is usually our metric to track our progress.
Upvotes: 4
Reputation: 14054
Without being a pytorch expert is my understanding that a module in the context of pytorch is simply a container, which takes receives tensors as input and computes tensors as output.
So, in conclusion, your model is quite likely to be composed of multiple modules, for example, you might have 3 modules each representing a layer of a neural network. Thus, they are related in the sense you need modules to actualise your model, but they aren't the same thing.
Hope that helps
Upvotes: 6