How to control if input features contribute exclusively to one neuron in subsequent layer of a Tensorflow neural network?

I'm trying to make the most basic of basic neural networks to get familiar with functional API in Tensorflow 2.x.

Basically what I'm trying to do is the following with my simplified iris dataset (i.e. setosa or not)

Use the 4 features as input
Dense layer of 3
Sigmoid activation function
Dense layer of 2 (one for each class)
Softmax activation
Binary cross entropy / log-loss as my loss function

However, I can't figure out how to control one key aspect of the model. That is, how can I ensure that each feature from my input layer contributes to only one neuron in my subsequent dense layer? Also, how can I allow a feature to contribute to more than one neuron?

This isn't clear to me from the documentation.

# Load data
from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris()
X, y = load_iris(return_X_y=True, as_frame=True)
X = X.astype("float32")
X.index = X.index.map(lambda i: "iris_{}".format(i))
X.columns = X.columns.map(lambda j: j.split(" (")[0].replace(" ","_"))
y.index = X.index
y = y.map(lambda i:iris.target_names[i])
y_simplified = y.map(lambda i: {True:1, False:0}[i == "setosa"])
y_simplified = pd.get_dummies(y_simplified, columns=["setosa", "not_setosa"])

# Traing test split
from sklearn.model_selection import train_test_split
seed=0
X_train,X_test, y_train,y_test= train_test_split(X,y_simplified, test_size=0.3, random_state=seed)

# Simple neural network
import tensorflow as tf
tf.random.set_seed(seed)


# Input[4 features] -> Dense layer of 3 neurons -> Activation function -> Dense layer of 2 (one per class) -> Softmax
inputs = tf.keras.Input(shape=(4))
x = tf.keras.layers.Dense(3)(inputs)
x = tf.keras.layers.Activation(tf.nn.sigmoid)(x)
x = tf.keras.layers.Dense(2)(x)
outputs = tf.keras.layers.Activation(tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="simple_binary_iris")
model.compile(loss="binary_crossentropy", metrics=["accuracy"] )
model.summary()

history = model.fit(X_train, y_train, batch_size=64, epochs=10, validation_split=0.2)

test_scores = model.evaluate(X_test, y_test)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])

Results:

Model: "simple_binary_iris"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_44 (InputLayer)        [(None, 4)]               0         
_________________________________________________________________
dense_96 (Dense)             (None, 3)                 15        
_________________________________________________________________
activation_70 (Activation)   (None, 3)                 0         
_________________________________________________________________
dense_97 (Dense)             (None, 2)                 8         
_________________________________________________________________
activation_71 (Activation)   (None, 2)                 0         
=================================================================
Total params: 23
Trainable params: 23
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
2/2 [==============================] - 0s 40ms/step - loss: 0.6344 - accuracy: 0.6667 - val_loss: 0.6107 - val_accuracy: 0.7143
Epoch 2/10
2/2 [==============================] - 0s 6ms/step - loss: 0.6302 - accuracy: 0.6667 - val_loss: 0.6083 - val_accuracy: 0.7143
Epoch 3/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6278 - accuracy: 0.6667 - val_loss: 0.6056 - val_accuracy: 0.7143
Epoch 4/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6257 - accuracy: 0.6667 - val_loss: 0.6038 - val_accuracy: 0.7143
Epoch 5/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6239 - accuracy: 0.6667 - val_loss: 0.6014 - val_accuracy: 0.7143
Epoch 6/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6223 - accuracy: 0.6667 - val_loss: 0.6002 - val_accuracy: 0.7143
Epoch 7/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6209 - accuracy: 0.6667 - val_loss: 0.5989 - val_accuracy: 0.7143
Epoch 8/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6195 - accuracy: 0.6667 - val_loss: 0.5967 - val_accuracy: 0.7143
Epoch 9/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6179 - accuracy: 0.6667 - val_loss: 0.5953 - val_accuracy: 0.7143
Epoch 10/10
2/2 [==============================] - 0s 7ms/step - loss: 0.6166 - accuracy: 0.6667 - val_loss: 0.5935 - val_accuracy: 0.7143
2/2 [==============================] - 0s 607us/step - loss: 0.6261 - accuracy: 0.6444
Test loss: 0.6261375546455383
Test accuracy: 0.644444465637207

Upvotes: 2

Answers (2)

mujjiga

Reputation: 16856

how can I ensure that each feature from my input layer contributes to only one neuron in my subsequent dense layer?

Have one input layer per feature and feed each input layer to a separate dense layer. Later you can concatenate the output of all the dense layers and proceed.

NOTE: One neuron can take any size input (in this case the input size is 1 as you want one feature to be used by the neuron) and the output size if always 1. A Dense layer with with n units will have n neurons and and so will have output size of n.

Working Sample

import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Model architecutre 
x1 = tf.keras.Input(shape=(1,))
x2 = tf.keras.Input(shape=(1,))
x3 = tf.keras.Input(shape=(1,))
x4 = tf.keras.Input(shape=(1,))

x1_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x1)
x2_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x2)
x3_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x3)
x4_ = tf.keras.layers.Dense(3, activation=tf.nn.relu)(x4)

merged = tf.keras.layers.concatenate([x1_, x2_, x3_, x4_])
merged = tf.keras.layers.Dense(16, activation=tf.nn.relu)(merged)
outputs = tf.keras.layers.Dense(3, activation=tf.nn.softmax)(merged)

model = tf.keras.Model(inputs=[x1,x2,x3,x4], outputs=outputs)
model.compile(loss="sparse_categorical_crossentropy", metrics=["accuracy"] )

# Load and prepare data
iris = load_iris()
X = iris.data
y = iris.target
X_train,X_test, y_train,y_test= train_test_split(X,y, test_size=0.3)

# Fit the model
model.fit([X_train[:,0],X_train[:,1],X_train[:,2],X_train[:,3]], y_train, batch_size=64, epochs=100, validation_split=0.25)

# Evaluate the model
test_scores = model.evaluate([X_test[:,0],X_test[:,1],X_test[:,2],X_test[:,3]], y_test)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])

Output:

Epoch 1/100
2/2 [==============================] - 0s 75ms/step - loss: 1.6446 - accuracy: 0.4359 - val_loss: 1.6809 - val_accuracy: 0.5185
Epoch 2/100
2/2 [==============================] - 0s 10ms/step - loss: 1.4151 - accuracy: 0.6154 - val_loss: 1.4886 - val_accuracy: 0.5556
Epoch 3/100
2/2 [==============================] - 0s 9ms/step - loss: 1.2725 - accuracy: 0.6795 - val_loss: 1.3813 - val_accuracy: 0.5556
Epoch 4/100
2/2 [==============================] - 0s 9ms/step - loss: 1.1829 - accuracy: 0.6795 - val_loss: 1.2779 - val_accuracy: 0.5926
Epoch 5/100
2/2 [==============================] - 0s 10ms/step - loss: 1.0994 - accuracy: 0.6795 - val_loss: 1.1846 - val_accuracy: 0.5926
Epoch 6/100
.................. [ Truncated ] 
Epoch 100/100
2/2 [==============================] - 0s 2ms/step - loss: 0.4049 - accuracy: 0.9333
Test loss: 0.40491223335266113
Test accuracy: 0.9333333373069763

Pictorial representation of the above model architecture

Upvotes: 1

rvinas

Reputation: 11895

Dense layers in Keras/TF are fully connected layers. For example, when you use a Dense layer as follows

inputs = tf.keras.Input(shape=(4))
x = tf.keras.layers.Dense(3)(inputs)

all the 4 connected input neurons are connected to all the 3 output neurons.

There isn't any predefined layer in Keras/TF to specify how to connect input and output neurons. However, Keras/TF is very flexible in that it allows you to define your custom layers easily.

Borrowing the idea from this answer, you could define a CustomConnected layer as follows:

class CustomConnected(tf.keras.layers.Dense):

    def __init__(self, units, connections, **kwargs):
        self.connections = connections
        super(CustomConnected, self).__init__(units, **kwargs)

    def call(self, inputs):
        self.kernel = self.kernel * self.connections
        return super(CustomConnected, self).call(inputs)

Using this layer, you can then specify the connections between two layers through the connections argument. For example:

inputs = tf.keras.Input(shape=(4))
connections = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 0, 1]])
x = CustomConnected(3, connections)(inputs)

Here, the 1st, 2nd, and 3rd input neurons are connected to the 1st, 2nd, and 3rd output neurons, respectively. Additionally, the 4th input neuron is connected to the 3rd output neuron.

UPDATE: As discussed in the comments section, an adaptive approach (e.g. by using only the maximum weight for each output neuron) is also possible but not recommended. You could implement this via the following layer:

class CustomSparse(tf.keras.layers.Dense):

    def __init__(self, units, **kwargs):
        super(CustomSparse, self).__init__(units, **kwargs)

    def call(self, inputs):
        nb_in, nb_out = self.kernel.shape
        argmax = tf.argmax(self.kernel, axis=0)  # Shape=(nb_out,)
        argmax_onehot = tf.transpose(tf.one_hot(argmax, depth=nb_in))  # Shape=(nb_in, nb_out)
        kernel_max = self.kernel * argmax_onehot
        # tf.print(kernel_max)  # Uncomment this line to print the weights
        out = tf.matmul(inputs, kernel_max)

        if self.bias is not None:
            out += self.bias

        if self.activation is not None:
            out = self.activation(out)

        return out

The main issue of this approach is that you cannot propagate gradients through the argmax operation required to select the maximum weight. As a result, the network will only "switch input neurons" when the selected weight is no longer the maximum weight.

Upvotes: 0

How to control if input features contribute exclusively to one neuron in subsequent layer of a Tensorflow neural network?

Answers (2)

Working Sample

Pictorial representation of the above model architecture

Related Questions