Jude Wells
Jude Wells

Reputation: 338

How can I add a bias and an activation function directly to my input layer in TensorFlow?

I would like to create a neural network with no hidden layers, but each input is summed with a bias and passed through a relu function before going to the softmax output layer. The weight associated with the bias of each input needs to be trainable. You could alternatively think of this as a neural network with one hidden layer however each node in the hidden layer is only connected to one input feature. What I want to achieve is the simplest possible architecture than can learn a threshold function for each input (which is achieved by the combination of the bias of the relu.) Each of these thresholded inputs is then summed in the output nodes which uses softmax for multiclass classification. I did consider adding a dense connected hidden layer and then adding a regularization function that sets all weights to zero except for one for each node - but then the problem with this approach is it will still attempt to train all of the weights that get set to zero after each update: aside from being inefficient would this interfere with the training of the weight that does not get set to zero? I know that Keras will automatically add biases to my output layer (this is fine).

Below is my code in TensorFlow:

   inputs = tf.keras.layers.Input(shape=(input_dim,))
   outputs = tf.keras.layers.Dense(output_dim, activation='softmax')(inputs)
   model = tf.keras.models.Model(inputs=inputs, outputs=outputs)

sketch of desired neural network architecture below

Upvotes: 1

Views: 2366

Answers (2)

Jude Wells
Jude Wells

Reputation: 338

After looking at the very helpful solution proposed by Zabir Al Nazi, I propose this modification so that the relu activation function is applied to the sum of the bias and the input (and not applied to the bias alone):

n = 3
ip = Input(shape=(n))

# branch 1
d1 = Dense(n, trainable = False, use_bias = False, kernel_initializer = 'zeros')(ip)
d2 = Dense(n, trainable = True, use_bias = True)(d1)

# branch 2
add = Add()([ip, d2])
add = Activation('relu')(add)
act = Activation('softmax')(add)

model = Model(ip, act)

model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 3)]          0                                            
__________________________________________________________________________________________________
dense (Dense)                   (None, 3)            9           input_1[0][0]                    
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 3)            12          dense[0][0]                      
__________________________________________________________________________________________________
add (Add)                       (None, 3)            0           input_1[0][0]                    
                                                                 dense_1[0][0]                    
__________________________________________________________________________________________________
activation (Activation)         (None, 3)            0           add[0][0]                        
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 3)            0           activation[0][0]                 
==================================================================================================
Total params: 21
Trainable params: 12
Non-trainable params: 9

Upvotes: 1

Zabir Al Nazi Nabil
Zabir Al Nazi Nabil

Reputation: 11198

Your idea will be very inefficient in Keras as most modern libraries focus on multiplication based weight matrix. Either you can write a custom layer or use some hack.

In summary, let's say you have inputs with n dimension, you want to add a bias to each input, apply relu and train it that way.

One hacky approach would be using an intermediate branch.

  1. input will be passed to two branches

  2. in the first branch we just pass input to non_trainable dense initialized with 0 and no bias.

  3. so, we get a bunch of zeros which are then passed to another trainable dense with bias = True

  4. finally, we use Add() with the previous unchanged inputs to add the tensors and apply softmax.

from tensorflow.keras.layers import Input, Dense, Add, Activation
from tensorflow.keras.models import Model

n = 3

ip = Input(shape=(n))

# branch 1
d1 = Dense(n, trainable = False, use_bias = False, kernel_initializer = 'zeros')(ip)
d2 = Dense(n, trainable = True, use_bias = True)(d1)
d2 = Activation('relu')(d2)

# branch 2

add = Add()([ip, d2])
act = Activation('softmax')(add)

model = Model(ip, act)

model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 3)]          0                                            
__________________________________________________________________________________________________
dense (Dense)                   (None, 3)            9           input_1[0][0]                    
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 3)            12          dense[0][0]                      
__________________________________________________________________________________________________
add (Add)                       (None, 3)            0           input_1[0][0]                    
                                                                 dense_1[0][0]                    
__________________________________________________________________________________________________
activation (Activation)         (None, 3)            0           add[0][0]                        
==================================================================================================
Total params: 21
Trainable params: 12
Non-trainable params: 9

Upvotes: 3

Related Questions