Reputation: 483
If I have a keras layer L, and I want to stack N versions of this layer (with different weights) in a keras model, what's the best way to do that? Please note that here N is large and controlled by a hyper param. If N is small then this not a problem (we can just manually repeat a line N times). So let's assume N > 10 for example.
If the layer has only one input and one output, I can do something like:
m = Sequential()
for i in range(N):
m.add(L)
But this is not working if my layer actually takes multiple inputs. For example, if my layer has the form z = L(x, y), and I would like my model to do:
x_1 = L(x_0, y)
x_2 = L(x_1, y)
...
x_N = L(x_N-1, y)
Then Sequential wouldn't do the job. I think I can subclass a keras model, but I don't know what's the cleanest way to put N layers into the class. I can use a list, for example:
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
self.layers = []
for i in range(N):
self.layers.append(L)
def call(self, inputs):
x = inputs[0]
y = inputs[1]
for i in range(N):
x = self.layers[i](x, y)
return x
But this is not ideal, as keras won't recognize these layers (it seems not thinking list of layers as "checkpointables"). For example, MyModel.variables would be empty, and MyModel.Save() won't save anything.
I also tried to define the model using the functional API, but it won't work in my case as well. In fact if we do
def MyModel():
input = Input(shape=...)
output = SomeLayer(input)
return Model(inputs=input, outputs=output)
It won't run if SomeLayer itself is a custom model (it raises NotImplementedError).
Any suggestions?
Upvotes: 6
Views: 7093
Reputation: 181
After having researched this to a great extent: I'm certain that there is no built-in, universal way to do this in Tensorflow/Keras.
There are, however, still ways to achieve the same goals but in a different way. The problem is that there's no universal solution to this in Tensorflow with any Keras layer, and so you'll have to approach it on a case-by-case basis.
So for example, if what you wanted to do is stack a bunch of Dense
layers and then have some dimension of your input that would correspond to each one (simple example), what you would instead want to do is construct a custom Dense
layer and add extra dimensions to its weights and biases, and then do the appropriate operations given some extra dimension in your input.
So ultimately the same (desired) operations would be performed here in the way that you want them to be, each input along some dimension would be put through a separate Dense
layer with separate weights/biases: but it would be done concurrently, without any python looping. In essence, you would be reducing the size and complexity of the graph and performing the same operations in a more concurrent way; this ought to be much more efficient.
The strategy outlined here generalises to any layer/input type. It's not great news, in that it would be of very high value to us (users) if there were some standardised Keras-friendly way of stacking a bunch of layers and then passing input to them in a more concurrent way that didn't involve python looping but rather concatenating the internal parameters into a new dimension and managing alignment between an extra 'stacking' dimension in both the inputs and parameters.
Like, in the same way we have tf.keras.Sequential
we could also benefit from something like tf.keras.Parallel
as a universal solution to this common ML need.
Upvotes: 1
Reputation: 358
If I understand your question correctly, you can solve this problem simply by using a for-loop when building the model. I'm not sure if you need any special layer so I will assume you only use Dense here:
def MyModel():
input = Input(shape=...)
x = input
for i in range(N):
x = Dense(number_of_nodes, name='dense %i' %i)(x)
// Or some custom layers
output = Dense(number_of_output)(x)
return Model(inputs=input, outputs=output)
Upvotes: 0
Reputation: 1612
Not sure if I've got your question right, but I guess that you could use the functional API and concatenate
or add
layers as it is shown in Keras applications, like, ResNet50 or InceptionV3 to build "non-sequential" networks.
UPDATE
In one of my projects, I was using something like this. I had a custom layer (it was not implemented in my version of Keras, so I've just manually "backported" the code into my notebook).
class LeakyReLU(Layer):
"""Leaky version of a Rectified Linear Unit backported from newer Keras
version."""
def __init__(self, alpha=0.3, **kwargs):
super(LeakyReLU, self).__init__(**kwargs)
self.supports_masking = True
self.alpha = K.cast_to_floatx(alpha)
def call(self, inputs):
return tf.maximum(self.alpha * inputs, inputs)
def get_config(self):
config = {'alpha': float(self.alpha)}
base_config = super(LeakyReLU, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def compute_output_shape(self, input_shape):
return input_shape
Then, the model:
def create_model(input_shape, output_size, alpha=0.05, reg=0.001):
inputs = Input(shape=input_shape)
x = Conv2D(16, (3, 3), padding='valid', strides=(1, 1),
kernel_regularizer=l2(reg), kernel_constraint=maxnorm(3),
activation=None)(inputs)
x = BatchNormalization()(x)
x = LeakyReLU(alpha=alpha)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Conv2D(32, (3, 3), padding='valid', strides=(1, 1),
kernel_regularizer=l2(reg), kernel_constraint=maxnorm(3),
activation=None)(x)
x = BatchNormalization()(x)
x = LeakyReLU(alpha=alpha)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Conv2D(64, (3, 3), padding='valid', strides=(1, 1),
kernel_regularizer=l2(reg), kernel_constraint=maxnorm(3),
activation=None)(x)
x = BatchNormalization()(x)
x = LeakyReLU(alpha=alpha)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Conv2D(128, (3, 3), padding='valid', strides=(1, 1),
kernel_regularizer=l2(reg), kernel_constraint=maxnorm(3),
activation=None)(x)
x = BatchNormalization()(x)
x = LeakyReLU(alpha=alpha)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Conv2D(256, (3, 3), padding='valid', strides=(1, 1),
kernel_regularizer=l2(reg), kernel_constraint=maxnorm(3),
activation=None)(x)
x = BatchNormalization()(x)
x = LeakyReLU(alpha=alpha)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(500, activation='relu', kernel_regularizer=l2(reg))(x)
x = Dense(500, activation='relu', kernel_regularizer=l2(reg))(x)
x = Dense(500, activation='relu', kernel_regularizer=l2(reg))(x)
x = Dense(500, activation='relu', kernel_regularizer=l2(reg))(x)
x = Dense(500, activation='relu', kernel_regularizer=l2(reg))(x)
x = Dense(500, activation='relu', kernel_regularizer=l2(reg))(x)
x = Dense(output_size, activation='linear', kernel_regularizer=l2(reg))(x)
model = Model(inputs=inputs, outputs=x)
return model
Finally, a custom metric:
def root_mean_squared_error(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
I was using the following snippet to create and compile the model:
model = create_model(input_shape=X.shape[1:], output_size=y.shape[1])
model.compile(loss=root_mean_squared_error, optimizer='adamax')
As usual, I was using a checkpoint callback to save the model. To load the model, you need to pass the custom layers classes and metrics as well into load_model
function:
def load_custom_model(path):
return load_model(path, custom_objects={
'LeakyReLU': LeakyReLU,
'root_mean_squared_error': root_mean_squared_error
})
Does it help?
Upvotes: 4