Reputation: 149
I'm experimenting with a model combining a convolutional neural network with a linear model. Here is a simplified version of it:
from tensorflow.keras import Sequential
from tensorflow.keras.experimental import WideDeepModel, LinearModel
num_classes = 1 ##(0='NO' or 1='YES')
cnn_model.Sequential()
cnn_model.add(Conv1D(20, 8, padding='same', activation='relu'))
cnn_model.add(GlobalAveragePooling1D())
cnn_model.add(Dropout(0.6))
cnn_model.add(Dense(num_classes, activation='sigmoid'))
linear_model = LinearModel()
combined_model = WideDeepModel(linear_model, cnn_model)
combined_model.compile(optimizer = ['sgd', 'adam'],
loss = ['mse','binary_crossentropy'],
metrics = ['accuracy'])
Performance is very good and everything seems to be going well until I sorted the predictions by pval
and I can see there are predictions >1 even when I'm using sigmoid activation which is I thought was supposed to bring everything between 0 and 1, and no activation function for the linear model (but inputs are all scaled 0-1):
pred = [ 1 if a > threshold else 0 for a in combined_model.predict([dplus_test, X_test])]
pv = combined_model.predict([dplus_test, X_test])
pval = [a[0] for a in pv]
true pred pval dplus
1633 1 1 1.002850 15.22404
1326 1 1 1.001444 10.34983
1289 1 1 1.001368 10.03043
1371 1 1 1.000986 10.74037
1188 1 1 1.000707 8.902
I checked on the other end of the data, and those predictions are as I expected, always >0.
true pred pval dplus
145 0 0 0.000463 1.81635
383 0 0 0.001023 3.24982
1053 0 0 0.001365 7.22535
This is not a problem so far, nothing crashes and I'm happy with the performance.
I would like to know if my understanding of the sigmoid activation function is wrong or if there is something in the Combined model that allows values to go above 1 and whether I can trust these results.
Upvotes: 1
Views: 422
Reputation: 1450
It's because your sigmoid is defined only on the output of the Deep
model and the way the WideDeepModel
combines the two model's outputs is by adding them (and your Wide
linear model can have arbitrary output). Since you include both mse
and binary_crossentropy
in your loss, the combined model actually learns to output values close to the expected range.
If you had just binary_crossentropy
, you would probably see values much larger than 1, since the formula for the loss is -p * log(q)
where q
is the output of your network, you could make the loss arbitrarily small by increasing q
indefinitely, which doesn't happen when your output is bounded.
The WideDeepModel
has an additional attribute activation
(see docs) where you can define the activation function of the whole model. If you want to squeeze the output between 0 and 1, set it to sigmoid
.
combined_model = WideDeepModel(linear_model, cnn_model, activation='sigmoid')
Also as a final note, in my experience, combining mean squared error and binary crossentropy like this doesn't make much sense, in practice you'd choose one or the other.
Upvotes: 1