Reputation: 23
I'm studying the basics of neural networks with pytorch, and I'm having a hard time understanding how the activation function should work.
I don't understand what shape should the trainable parameters of my activation function have. Should they have the same shape of the input dataset? Or should they have the shape of a single element in the dataset?
From what I understood the activation function should take as input the whole dataset, but I'm unsure of how to initialize the parameters.
class customModel(nn.Module):
def __init__(self, units):
super(customModel, self).__init__()
self.p1 = nn.parameter.Parameter(torch.ones(units))
self.p2 = nn.parameter.Parameter(torch.ones(units))
self.b1 = nn.parameter.Parameter(torch.zeros(units))
self.b2 = nn.parameter.Parameter(torch.zeros(units))
def forward(self, inputs):
out = myCustomActivationFunction(inputs, self.p1, self.p2, self.b1, self.b2)
return out
Upvotes: -1
Views: 534
Reputation: 121
The activation was used to create a "Non-Linearity" between each layer which is always Linear(without activation function) and we usually choose the activation function based on our task Such as we use ReLu Between the neural network layers to create a "Non-Linearity" between each layer and we use sigmoid in the output layer to normalize value between 0-1 for the binary classification task using 0.5 as a threshold for classify between two classes
To Fully Grasp how activation function was used in Neural Network First of all, we need to create a clear understanding In Neural Networks between
To Understand the Neural Network I recommend you to understand the Linear Regression model first! since it will be easier to understand about weight
y = mx+b is a Linear Function that can be leveraged to create a simple model that can predict data with Linear correlation (we call this model Linear Regression)
with "x" as the input "y" as the output and "m,b" as the features
this "m" and "b" is a trainable parameters while x is an input features
More Explanation About Linear Regression
**It's a bit hard to explain since I can't attach the image at my reputation Level So I'll attach the video link instead
Assume you are already familiar with Linear Regression
The Neural Networks are like a chain of Linear Models connected which will be called Neuron, Stacking as a Layer
Layers1 Example (First Layer Generally Called Input layer)
because it's the layer that we'll put ours features in
[x1]
[x2]
[x3]
each neuron in the layer will have a "LINE" that connected to every neuron in the next layer
each "LINE" contain its own w (weight) which is a trainable parameter
the same as "m" and "b" we can train in y = mx+b
when computing, the input was put in each X of the input layer
and then they will times the weight on the LINE connecting to each neuron of the next layer and sum them up at the destination
the formula is
Yi= sum(Xi*Wi)
To Simplify In the image below
You can think of it as computing 2 Linear Regression models Separately
then sum the output up -> use it as the input for the next Linear Regression model which predicts the gender
IT IS AT THIS PART where we'll really need the activation function Assume that the previous layers provide the information on how to predict the BIOLOGICAL Gender, Given Height & Weight
The possibility of the output layer which is just
y = mx+b formular
is -infinity & +infinity
how would you classify this output into two classes?
The Answer is by using an activation function such as Sigmoid which normalizes any range of value into between 0-1 range thinking of this as a percentage we can now use a 0.5 cutoff threshold to classify below 0.5 into class "0" and over 0.5 into class "1"
As you can see, We don't train the activation function it's the trainable parameters that will be trained! The activation was used to create a "Non-Linearity" between each layer we usually choose the activation function based on our task Such as Sigmoid For Classification
To Implement Custom Activation, Just Create a Function that receives 1 input and then returns something
def custom_act(x):
return -x
in case you need it to be trainable (which usually doesn't need to)
Referring to this Question, Already Have Good Explanation Pytorch custom activation functions?
Using Sigmoid for Logistic Regression
Activation Function Explain For Binary Classification and Keras Implementation
Upvotes: 2