Question regarding how to directly apply softmax onto a logits with tf.keras.activation.softmax vs using tf.keras.layers.Softmax

Question

I am completely new to this area so my question may sound stupid. I have the following model defined using Keras, which takes multiple inputs and outputs to predict one of the 2 outcomes:

        inputs = []
        outputs = []
        for feature in features:
            length = feature.length
            input = tf.keras.Input(batch_size=batch_size, shape=(length,), sparse=False,
                                       name=feature.name)
            output = tf.keras.layers.Dense(units=2)(raw_input)
            inputs.append(input)
            outputs.append(output)

        logits = tf.keras.layers.add(outputs)
        ##########################################################
        # Opt1: probabilities = tf.keras.activation.softmax(logits)
        # Opt2: probabilities = tf.keras.layers.Softmax()(logits)
        # Opt3: probabilities = tf.keras.layers.Softmax(name="label")(logits)
        ##########################################################
        model = tf.keras.Model(inputs=inputs, outputs=probabilities)
        model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                      metric=["accuracy"])

I want the model to output to show the probability of one of the 2 predicted outcome, so I attempt to do softmax on the logits as output.

I have tried the 3 options, as shown above in the code (aka: Opt1, Opt2, Opt3). Opt1 gives the following error: ValueError: No data provided for "tf_op_layer_Softmax". Need data for each key in: ['tf_op_layer_Softmax'] Opt2 gives a similar error: ValueError: No data provided for "softmax". Need data for each key in: ['softmax'] However, Opt3 runs just fine, despite that it is same as Opt2, except with a different name.

My questions are main the following: 1. In a Keras model, how do we usually directly apply a softmax onto logits without creating another layer? 2. What is the difference between Opt2 and Opt3, since its just a name change?

Thanks for the help

Zabir Al Nazi Nabil · Accepted Answer

Not sure, what do you mean by directly. But many layers have activation parameter which you can use to apply softmax.

For example, in a Dense layer, you can say something like,

dense = tf.keras.layers.Dense(10, activation = 'softmax')(in_layer)

There is no such parameter for add() so you can use another activation layer.

softmax_out = tf.keras.layers.Activation('softmax')(in_layer)

You have inconsistencies in your design.

output = tf.keras.layers.Dense(units=2)(raw_input)

Where this raw_input is coming from? There's no practical difference between them, it's most probably just the way you've designed the network.

Here's the fixed version which works with Op2.

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras import Model
from tensorflow.keras.losses import Loss
import matplotlib.pyplot as plt

inputs = []
outputs = []

class Feature:
  def __init__(self, len_ = 10, name_ = 'unk'):
    self.length = len_
    self.name = name_

features = []

for i in range(5):
  f = Feature(name_ = 'unk' + str(i))
  features.append(f)

batch_size = 10

for feature in features:
    length = feature.length
    input = tf.keras.Input(batch_size=batch_size, shape=(length,), sparse=False,
                                name=feature.name)
    output = tf.keras.layers.Dense(units=2)(input)
    inputs.append(input)
    outputs.append(output)

logits = tf.keras.layers.add(outputs)

probabilities = tf.keras.layers.Softmax()(logits)

model = tf.keras.Model(inputs=inputs, outputs=probabilities)
model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=["accuracy"])

model.summary()

Model: "model_3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
unk0 (InputLayer)               [(10, 10)]           0                                            
__________________________________________________________________________________________________
unk1 (InputLayer)               [(10, 10)]           0                                            
__________________________________________________________________________________________________
unk2 (InputLayer)               [(10, 10)]           0                                            
__________________________________________________________________________________________________
unk3 (InputLayer)               [(10, 10)]           0                                            
__________________________________________________________________________________________________
unk4 (InputLayer)               [(10, 10)]           0                                            
__________________________________________________________________________________________________
dense_17 (Dense)                (10, 2)              22          unk0[0][0]                       
__________________________________________________________________________________________________
dense_18 (Dense)                (10, 2)              22          unk1[0][0]                       
__________________________________________________________________________________________________
dense_19 (Dense)                (10, 2)              22          unk2[0][0]                       
__________________________________________________________________________________________________
dense_20 (Dense)                (10, 2)              22          unk3[0][0]                       
__________________________________________________________________________________________________
dense_21 (Dense)                (10, 2)              22          unk4[0][0]                       
__________________________________________________________________________________________________
add_3 (Add)                     (10, 2)              0           dense_17[0][0]                   
                                                                 dense_18[0][0]                   
                                                                 dense_19[0][0]                   
                                                                 dense_20[0][0]                   
                                                                 dense_21[0][0]                   
__________________________________________________________________________________________________
softmax_3 (Softmax)             (10, 2)              0           add_3[0][0]                      
==================================================================================================
Total params: 110
Trainable params: 110
Non-trainable params: 0

Question regarding how to directly apply softmax onto a logits with tf.keras.activation.softmax vs using tf.keras.layers.Softmax

Answers (1)

Related Questions