Reputation: 23
I want to implement a classifier with a sparse input layer. My data has about 60 dimenions and I want to check for feature importance. To do I this want the first layer to have a diagonal weight matrix(to which I want to apply a L1 kernel regularizer ), all off diagonals should be non trainable zeros. So a one to one connection per input channel, a Dense layer would mix the input variables. I checked Specify connections in NN (in keras) and Custom connections between layers Keras. The latter one I could not use as Lambda layers do not introduce trainable weights.
Something like this however does not affect the actual weight matrix:
class MyLayer(Layer):
def __init__(self, output_dim,connection, **kwargs):
self.output_dim = output_dim
self.connection=connection
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
self.kernel=tf.linalg.tensor_diag_part(self.kernel)
self.kernel=tf.linalg.tensor_diag(self.kernel)
super(MyLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
return K.dot(x, self.kernel)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
When I train the model and print the weights, I do not get a diagonal matrix for the first layer.
What am I doing wrong?
Upvotes: 2
Views: 2267
Reputation: 2621
Not quite sure what you want to do exactly, because, for me, diagonal
is something for a square matrix, implying your layer input and output dimensionality should be unchanged.
Anyway, let's talk about the square matrix case first. I think there are two ways of implementing a weight matrix with all zeros values off the diagonal.
Method 1: only conceptually follow the square matrix idea, and implement this layer with a trainable weight vector as follows.
# instead of writing y = K.dot(x,W),
# where W is the weight NxN matrix with zero values of the diagonal.
# write y = x * w, where w is the weight vector 1xN
Method 2: use the default Dense
layer, but with your own constraint.
# all you need to create a mask matrix M, which is a NxN identity matrix
# and you can write a contraint like below
class DiagonalWeight(Constraint):
"""Constrains the weights to be diagonal.
"""
def __call__(self, w):
N = K.int_shape(w)[-1]
m = K.eye(N)
w *= m
return w
Of course, you should use Dense( ..., kernel_constraint=DiagonalWeight())
.
Upvotes: 4