echo
echo

Reputation: 135

Why does this TensorFlow example not have a summation before the activation function?

I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.

picture

In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.

Here is an example of one of those snippets:

weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}


# Create model
def multilayer_perceptron(x):
    # Hidden fully connected layer with 256 neurons
    layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
    # Hidden fully connected layer with 256 neurons
    layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
    # Output fully connected layer with a neuron for each class
    out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
    return out_layer

In each layer, we first multiply the inputs with a weights. Afterwards, we add the bias term. Then we pass those to the tf.nn.relu. Where does the summation happen? It looks like we've skipped this!

Any help would be really great!

Upvotes: 0

Views: 363

Answers (2)

Whynote
Whynote

Reputation: 992

The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).

Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);

x = [2,3,1] y = [3, 1, 2]

Then the result would be:

tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11

There you can see the weighted sum.

p.s: tf.multiply performs element-wise multiplication, which is not what we want here.

Upvotes: 1

Vlad
Vlad

Reputation: 8595

The last layer of your model out_layer outputs probabilities of each class Prob(y=yi|X) and has shape [batch_size, n_classes]. To calculate these probabilities the softmax function is applied. For each single input data point x that your model receives it outputs a vector of probabilities y of size number of classes. You then pick the one that has highest probability by applying argmax on the output vector class=argmax(P(y|x)) which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1).

Consider network with a single layer. You have input matrix X of shape [n_samples, x_dimension] and you multiply it by some matrix W that has shape [x_dimension, model_output]. The summation that you're talking about is dot product between the row of matrix X and column of matrix W. The output will then have shape [n_samples, model_output]. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.

Mathematically, the layer without bias can be described as enter image description here and suppose that the first row of matrix enter image description here (the first row is a single input data point) is

enter image description here

and first column of W is

enter image description here

The result of this dot product is given by

enter image description here

which is your summation. You repeat this for each column in matrix W and the result is vector of size model_output (which correspond to the number of columns in W). To this vector you add bias (if needed) and then apply activation.

Upvotes: 2

Related Questions