Reputation: 382
I am currently trying to learn Deep Learning by focussing on Keras and the book "Deep Learning with Python-Keras"
I do have an example - I do understand the code but not the result - where I need your help. The example is about analyzing movie review from the imdB dataset which is included in Keras. The code goes as follows
def vectorize_sequences(sequences,dimension=10000):
results=np.zeros((len(sequences),dimension))
for i, sequence in enumerate(sequences):
results[i,sequence]=1.
return results
X_train=vectorize_sequences(train_data)
X_test=vectorize_sequences(test_data)
y_train=np.asarray(train_labels)
y_test=np.asarray(test_labels)
model=models.Sequential()
model.add(layers.Dense(16,activation="relu",input_shape=(10000,)))
model.add(layers.Dense(16,activation="relu"))
model.add(layers.Dense(1,activation="sigmoid"))
model.compile(optimizer="rmsprop",loss="binary_crossentropy",metrics=["accuracy"])
history=model.fit(X_train,y_train,epochs=4,batch_size=512)
In the explanation it is written, that "the final layer will use a sigmoid activation so as to output a probability indicating how likely the sample is to have the target “1”"
I know that the sigmoid function ranges between [0,1]. Suppose the output of my network is 0.6 Why am I allowed to say that this value gives the probability to have the target "1" and not the target "0"?
I am kind of stucked and need some help :)
Upvotes: 0
Views: 117
Reputation: 1143
The interpretation of your output depends on the labels you used during your training. So train_labels
and test_labels
are concluded of 0s and 1s.
During training, the network is optimized to yield the correct label corresponding to an input sequence. So if your output is 0 or 1, the network is giving a confident classification. But, if your output is e.g. 0.5, the network is totally unsure to which class your input belongs.
Now we make the assumption that your input corresponds to class 1. In case of an output like 0.6, the class might be 1, but only with a confidence of 60 percent. It describes the probability to be class 1, since an output of 1 is a correct interpretation of the input to its label. If the output would be a 0, it would be the worst classification of the input since the label is 1. So this in the end corresponds to values ranging from 0 to 1, while the closer to 1 you are the better the classification - so it is a probability in the end.
But keep in mind that this definition only holds if you know that your input belongs to class 1. If it instead is part of class 0, the previous definition has to be turned around.
So in the end, you got two options. First, you can take these values as they are and treat them as a probability an input corresponds to one of the classes. Second, you can introduce a threshold - in this case it makes sense to set it to 0.5 - and say that if you are larger than the threshold, categorize your input to class 1, else to class 0. The closer your output is to 0.5 the more the network is just guessing the class in the end.
The choice of the threshold has a direct influence on the performance of your network in the end. This can be evaluated for example with a ROC curve (https://en.wikipedia.org/wiki/Receiver_operating_characteristic).
Upvotes: 1