Reputation: 135
I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.
In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.
Here is an example of one of those snippets:
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
return out_layer
In each layer, we first multiply the inputs with a weights
. Afterwards, we add the bias
term. Then we pass those to the tf.nn.relu
. Where does the summation happen? It looks like we've skipped this!
Any help would be really great!
Upvotes: 0
Views: 363
Reputation: 992
The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).
Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);
x = [2,3,1] y = [3, 1, 2]
Then the result would be:
tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11
There you can see the weighted sum.
p.s: tf.multiply performs element-wise multiplication, which is not what we want here.
Upvotes: 1
Reputation: 8595
The last layer of your model out_layer
outputs probabilities of each class Prob(y=yi|X)
and has shape [batch_size, n_classes]
. To calculate these probabilities the softmax
function is applied. For each single input data point x
that your model receives it outputs a vector of probabilities y
of size number of classes. You then pick the one that has highest probability by applying argmax
on the output vector class=argmax(P(y|x))
which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1)
.
Consider network with a single layer. You have input matrix X
of shape [n_samples, x_dimension]
and you multiply it by some matrix W
that has shape [x_dimension, model_output]
. The summation that you're talking about is dot product between the row of matrix X
and column of matrix W
. The output will then have shape [n_samples, model_output]
. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.
Mathematically, the layer without bias can be described as and suppose that the first row of matrix
(the first row is a single input data point) is
and first column of W
is
The result of this dot product is given by
which is your summation. You repeat this for each column in matrix W
and the result is vector of size model_output
(which correspond to the number of columns in W
). To this vector you add bias (if needed) and then apply activation.
Upvotes: 2