Reputation: 338
I would like to create a neural network with no hidden layers, but each input is summed with a bias and passed through a relu function before going to the softmax output layer. The weight associated with the bias of each input needs to be trainable. You could alternatively think of this as a neural network with one hidden layer however each node in the hidden layer is only connected to one input feature. What I want to achieve is the simplest possible architecture than can learn a threshold function for each input (which is achieved by the combination of the bias of the relu.) Each of these thresholded inputs is then summed in the output nodes which uses softmax for multiclass classification. I did consider adding a dense connected hidden layer and then adding a regularization function that sets all weights to zero except for one for each node - but then the problem with this approach is it will still attempt to train all of the weights that get set to zero after each update: aside from being inefficient would this interfere with the training of the weight that does not get set to zero? I know that Keras will automatically add biases to my output layer (this is fine).
Below is my code in TensorFlow:
inputs = tf.keras.layers.Input(shape=(input_dim,))
outputs = tf.keras.layers.Dense(output_dim, activation='softmax')(inputs)
model = tf.keras.models.Model(inputs=inputs, outputs=outputs)
sketch of desired neural network architecture below
Upvotes: 1
Views: 2366
Reputation: 338
After looking at the very helpful solution proposed by Zabir Al Nazi, I propose this modification so that the relu activation function is applied to the sum of the bias and the input (and not applied to the bias alone):
n = 3
ip = Input(shape=(n))
# branch 1
d1 = Dense(n, trainable = False, use_bias = False, kernel_initializer = 'zeros')(ip)
d2 = Dense(n, trainable = True, use_bias = True)(d1)
# branch 2
add = Add()([ip, d2])
add = Activation('relu')(add)
act = Activation('softmax')(add)
model = Model(ip, act)
model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 3)] 0
__________________________________________________________________________________________________
dense (Dense) (None, 3) 9 input_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 3) 12 dense[0][0]
__________________________________________________________________________________________________
add (Add) (None, 3) 0 input_1[0][0]
dense_1[0][0]
__________________________________________________________________________________________________
activation (Activation) (None, 3) 0 add[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 3) 0 activation[0][0]
==================================================================================================
Total params: 21
Trainable params: 12
Non-trainable params: 9
Upvotes: 1
Reputation: 11198
Your idea will be very inefficient in Keras as most modern libraries focus on multiplication based weight matrix. Either you can write a custom layer or use some hack.
In summary, let's say you have inputs with n
dimension, you want to add a bias to each input, apply relu and train it that way.
One hacky approach would be using an intermediate branch.
input will be passed to two branches
in the first branch we just pass input to non_trainable dense initialized with 0 and no bias.
so, we get a bunch of zeros which are then passed to another trainable dense with bias = True
finally, we use Add() with the previous unchanged inputs to add the tensors and apply softmax.
from tensorflow.keras.layers import Input, Dense, Add, Activation
from tensorflow.keras.models import Model
n = 3
ip = Input(shape=(n))
# branch 1
d1 = Dense(n, trainable = False, use_bias = False, kernel_initializer = 'zeros')(ip)
d2 = Dense(n, trainable = True, use_bias = True)(d1)
d2 = Activation('relu')(d2)
# branch 2
add = Add()([ip, d2])
act = Activation('softmax')(add)
model = Model(ip, act)
model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 3)] 0
__________________________________________________________________________________________________
dense (Dense) (None, 3) 9 input_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 3) 12 dense[0][0]
__________________________________________________________________________________________________
add (Add) (None, 3) 0 input_1[0][0]
dense_1[0][0]
__________________________________________________________________________________________________
activation (Activation) (None, 3) 0 add[0][0]
==================================================================================================
Total params: 21
Trainable params: 12
Non-trainable params: 9
Upvotes: 3