Reputation: 441
I'm following Coursera's Tensorflow course and I cannot understand the below code. Can you explain it using simple English, please?
model = tf.keras.models.Sequential(
[tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
I want to know what does tf.nn.softmax
do. In the course description they have the below description but it is not clear to me.
Sequential: That defines a SEQUENCE of layers in the neural network
Flatten: Remember earlier where our images were a square, when you printed them out? > Flatten just takes that square and turns it into a 1 dimensional set.
Dense: Adds a layer of neurons
Each layer of neurons need an activation function to tell them what to do. There's lots of options, but just use these for now.
Relu effectively means "If X>0 return X, else return 0" -- so what it does it it only passes values 0 or greater to the next layer in the network.
Softmax takes a set of values, and effectively picks the biggest one, so, for example, if the output of the last layer looks like
[0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05]
, it saves you from fishing through it looking for the biggest value, and turns it into[0,0,0,0,1,0,0,0,0]
-- The goal is to save a lot of coding!
Upvotes: 3
Views: 10484
Reputation: 96
tf.keras.layers.Dense
creates a layer in a neural network. In your code, you have two of these layers. The first layer has 128 "neurons" and uses a special math function called tf.nn.relu
for activation. This layer helps the network learn patterns in the data. The second layer has 10 neurons and uses tf.nn.softmax
for activation. This layer helps the network make predictions.
Now, let's focus on tf.nn.softmax
. It is like a voting system for the 10 neurons in the second layer. It takes the numbers coming from these neurons and turns them into probabilities. Imagine you have 10 numbers representing the network's confidence in different options. These numbers could be like this:
[2.0, 3.0, 1.0, 0.1, 2.5, 1.8, 0.5, 1.2, 0.7, 2.2]. When you use tf.nn.softmax
, it makes these numbers more understandable. It compresses them so they add up to 1. It's like saying, "How sure is the network about each option?" After applying tf.nn.softmax
, the numbers might look like this: [0.11274497, 0.30647259, 0.04147656, 0.01686311, 0.18588502, 0.09230777, 0.0251568, 0.05065958, 0.03072659, 0.13770701]. Now, these new numbers tell you the probabilities. For example, For example, the network is most confident (30.6%) about the second option (3.0), and not very confident (1.7%) about the fourth option (0.1).
Upvotes: 2
Reputation: 31
Softmax activation will takes a real vector as input and convert it in to a vector of categorical probabilities. For example in case of fashionMNIST, there are 10 categories and the prediction from a Dense layer would be a real vector when the layer is activated using softmax function it will then convert it in to probabilities for each category (which will all add up to 1). In summary it convert the result in to a probability distribution.
Read more about the activation function here - TensorFlow Documentation
Example Code:
from tensorFlow.keras.layers import Dense
...
predictions = Dense(10, activation="softmax")
Upvotes: 3
Reputation: 791
Here's the docs: https://www.tensorflow.org/api_docs/python/tf/nn/softmax
Basically, softmax is good for classification. It will take any number and map it to an output of either 0 or 1 (for example) because we say that if Softmax(X) <0.5 then set it equal to zero and if Softmax(X)>=0.5 then set it equal to 1.
Take a look at this article here, which also describes the sigmoid and softmax function. The graphs are important. Also a google image search will give some graphs of the function.
http://dataaspirant.com/2017/03/07/difference-between-softmax-function-and-sigmoid-function/
Upvotes: 4