Reputation: 8537
I'm solving a multi-class image classification problem. Training a CNN directly did not give good results for the task, so I'm now attempting a workaround. The idea is to train multiple binary classification networks, one for each image class, and then to merge their outputs. The output layer of the merged model should be a vector of N+1 elements, where N is the number of image classes. The argmax of the output vector is the classification result (image class). The first element of the output vector is a bias term that is only activated if the image fits in none of the classes with high probability. This element is going to have a constant (bias) value in the output vector, which will serve as a classification confidence threshold.
I managed to do what I want by adding another input to the network; this input is initialized to a constant value and serves as the bias. Is it possible to what I want without changing the input shape of the network, by somehow embedding the constant value in the network itself?
This is my code so far:
inputs = Input(shape=(240, 320), name="img_input")
x = tf.keras.layers.Flatten()(inputs)
outputs1 = Dense(1, activation='sigmoid')(x)
outputs2 = Dense(1, activation='sigmoid')(x)
bias = Input(shape=(1,), name="bias_input")
merged = Concatenate()([bias, outputs1, outputs2])
model = Model(inputs=[inputs, bias], outputs=merged)
print(model.summary())
plot_model(model, to_file='network.png', show_shapes=True)
Testing the model:
data = np.ones(320 * 240).reshape((1, 240, 320))
bias = (np.ones(1) / 2.0).reshape((1, 1)) # use 1.0/2 as the bias value
# prints [[0.5 1. 1. ]] as expected
print(model.predict({"img_input": data, "bias_input": bias}))
Upvotes: 0
Views: 1992
Reputation: 11651
You can simply create a constant value in your model definition:
bias_value = 0.5
bias = tf.ones((tf.shape(outputs1)[0],1))*bias_value
merged = Concatenate()([bias, outputs1, outputs2])
tf.shape(outputs1)[0]
ensures that the batch shape of bias
is compatible with the other outputs to be merged.
You also need to update the model construction, as the bias input does not exist anymore.
model = Model(inputs=inputs, outputs=merged)
Running the model with your sample data gives:
>>> model(data)
<tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[0.5, 1. , 1. ]], dtype=float32)>
Upvotes: 1