Reputation: 95
I'm learning about neural networks and they are some of the neatest things I've come across.
My question is: how do you compute the output of a neural network with arbitrary topology? Is there some algorithm or rule of thumb to use?
For example, I understand that feed-forward networks have straightforward matrix representations, but what about networks with loops or with outputs connected to inputs? Is there a matrix form for those? Or is the only way to produce output to do some kind of graph-traversal?
Example:
Upvotes: 3
Views: 456
Reputation: 810
Let's look at the picture with the your neural network structure,
attached to your question.
Artificial neural networks connections are not the common directed graph as it seems to be.
There are additional restrictions implied here, such as different types of nodes distributed by layers.
There are input nodes, hidden nodes and output nodes.
In simple words, input nodes (neuron values) are considered to be read-only, and there are no way of modifications. That's why connection between nodes 9 and 4 is meaningless, as input 4 itself, because it's signal doesn't propagates further.
The same is for connection between nodes 8 and 11.
You may look here, basics of neural networks are explained in a simple way.
Talking about networks with loops, we assume recurrent networks.
Assume, we have recursive neural network shown at the picture below.
How would we the output be calculated?
We can try to apply the same calculation rule as for feed-forward network.
,
, here f - activation function.
But, wait a second, don't we need to know
and
values?
Technically, this is not a recursion.
You can read it as “Next value of node depends on the current value of node ”.
The dynamics of the network depicted can be visualized by “unfolding” (shown below).
So recurrent network can be treated as a deep network with single layer per time step and shared weights across time steps. Here, we treat step 0 hidden layers as inputs for step 1.
Back to our case, formula for the first step calculation should look like .
Similarly, for the second step could be calculated as
.
To simplify, values of and could be initialized by zeros (though practically initial state is trained as model parameter). Unfolding is used for training (technically, we simply replace recurrent network with the series of feed-forward hidden layers).
So to conclude, recurrent networks still have matrix representations and operations, though it seems not obvious and straightforward.
Upvotes: 3