Reputation: 160
At first, this question is less about programming itself but about some logic behind the CNN architecture. I do understand how every layer works but my only question is: Does is make sense to separate the ReLU and Convolution-Layer? I mean, can a ConvLayer exist and work and update its weights by using backpropagation without having a ReLU behind it?
I thought so. This is why I created the following independent layers:
I am thinking about merging Layer 1 and 2 into one. What should I go for?
Upvotes: 1
Views: 1583
Reputation: 124
From the TensorFlow perspective, all of your computations are nodes in the graph(usually called a session). So if you want to separate layers which means adding nodes to your computation graph, go ahead but I don't see any practical reason behind it. You can backpropagate it of course, since you are just calculating the gradient of every function with derivation.
Upvotes: 0
Reputation: 337
Can it exist?
Yes. It can. There's nothing that stops neural networks from working without non-linearity modules in the model. The thing is, skipping the non-linearity module between two adjacent layers is equivalent to just a linear combination of inputs at layer 1 to get output at layer 2
M1 : Input =====> L1 ====> ReLU ====> L2 =====> Output
M2 : Input =====> L1 ====> ......... ====> L2 =====> Output
M3 : Input =====> L1 =====> Output
M2 & M3 are equivalent since the parameters adjust themselves over the training period to generate the same output. If there is any pooling involved in between, this may not be true but as long as the layers are consecutive, the network structure is just one large linear combination (Think PCA)
There is nothing that prevents the gradient updates & back-propagation throughout the network.
What should you do?
Keep some form of non-linearity between distinct layers. You may create convolution blocks which contain more than 1 convolution layer in it, but you should include a non-linear function at the end of these blocks and definitely after the dense layers. For the dense layer not-using an activation function is completely equivalent to using a single layer.
Have a look here Quora : Role of activation functions
Upvotes: 2
Reputation: 1
The short answer is: ReLU (or other activation mechanisms) should be added to each of you convolution or fully connected layers.
CNNs and neural networks in general use activation functions like ReLU to introduce non linearity in the model. Activation functions are usually not a layer themselves, they are an additional computation to each node of a layer. You can see them as an implementation of the mechanism that decides between finding vs not finding a specific pattern. See this post.
Upvotes: 0