raghav2956
raghav2956

Reputation: 71

Understanding when to and when not to use Softmax as output layer activation

So I just started working with neural nets and set out to make a basic image classification network with binary labels. From my understanding of neural nets, I thought that the purpose of having the Softmax activation function in the output layer was to convert the incoming information into probabilities of the labels with the predicted label being the one with the higher probability. So my first question is -

I am pretty sure this is some obvious issue that escapes me regarding the network architecture and the various hyperparameters that I use. Would be grateful for your help! I am pasting my code below for you to take a look, haven't put the out but let me know if you need that too.

#Train Data
INPUT_FOLDER = '../input/chest-xray-pneumonia/chest_xray/train/NORMAL'
images = os.listdir(INPUT_FOLDER)
X_train_1 = []
for instance in images:
    image = Image.open('../input/chest-xray-pneumonia/chest_xray/train/NORMAL/' + instance)
    image_rz = image.resize((100,100)).convert('L')
    array = np.array(image_rz)
    X_train_1.append(array)
X_train_1 = np.array(X_train_1)
print(X_train_1.shape)

INPUT_FOLDER = '../input/chest-xray-pneumonia/chest_xray/train/PNEUMONIA'
images = os.listdir(INPUT_FOLDER)
X_train_2 = []
for instance in images:
    image = Image.open('../input/chest-xray-pneumonia/chest_xray/train/PNEUMONIA/' + instance)
    image_rz = image.resize((100,100)).convert('L')
    array = np.array(image_rz)
    X_train_2.append(array)
X_train_2 = np.array(X_train_2)
print(X_train_2.shape)
X_trn = np.concatenate((X_train_1, X_train_2))
print(X_trn.shape)

#Make Labels
y_trn = np.zeros(5216, dtype = str)
y_trn[:1341] = 'NORMAL'
y_trn[1341:] = 'PNEUMONIA'
y_trn = y_trn.reshape(5216,1)

#Shuffle Labels 
X_trn, y_trn = shuffle(X_trn, y_trn)

#Onehot encode categorical labels
onehot_encoder = OneHotEncoder(sparse=False)
y_trn = onehot_encoder.fit_transform(y_trn)

#Model
model = keras.Sequential([
    keras.layers.Flatten(input_shape = (100,100)),
    keras.layers.Dense(256, activation = 'selu'),
    keras.layers.Dense(2, activation = 'softmax')
])

adm = optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)

model.compile(optimizer = adm,
              loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

for layer in model.layers:
    print(layer, layer.trainable)

model.fit(X_trn, y_trn, validation_data = (X_val, y_val), epochs=30, shuffle = True)



Upvotes: 0

Views: 2890

Answers (1)

Bashir Kazimi
Bashir Kazimi

Reputation: 1377

The secret lies in your loss function. When you set from_logits=True in your loss function:

loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True) 

it expects that the values come from a layer without a softmax activation, so it performs the softmax operation itself. If you already have a softmax function in your final layer, you should not set from_logits to True, set it to False.

Your model works well without the softmax function and bad with the softmax function for this reason.

Upvotes: 4

Related Questions