Reputation: 473
I am wondering if I succeeded in translating the following definition in PyTorch to Keras?
In PyTorch, the following multi-layer perceptron was defined:
from torch import nn
hidden = 128
def mlp(size_in, size_out, act=nn.ReLU):
return nn.Sequential(
nn.Linear(size_in, hidden),
act(),
nn.Linear(hidden, hidden),
act(),
nn.Linear(hidden, hidden),
act(),
nn.Linear(hidden, size_out),
)
My translation is
from tensorflow import keras
from keras import layers
hidden = 128
def mlp(size_in, size_out, act=keras.layers.ReLU):
return keras.Sequential(
[
layers.Dense(hidden, activation=None, name="layer1", input_shape=(size_in, 1)),
act(),
layers.Dense(hidden, activation=None, name="layer2", input_shape=(hidden, 1)),
act(),
layers.Dense(hidden, activation=None, name="layer3", input_shape=(hidden, 1)),
act(),
layers.Dense(size_out, activation=None, name="layer4", input_shape=(hidden, 1))
])
I am particularly confused about the input/output arguments, because that seems to be where tensorflow and PyTorch differ.
From the documentation:
When a popular kwarg input_shape is passed, then keras will create an input layer to insert before the current layer. This can be treated equivalent to explicitly defining an InputLayer.
So, did I get it right?
Upvotes: 0
Views: 114
Reputation: 40618
In Keras, you can provide an input_shape
for the first layer or alternatively use the tf.keras.layers.Input
layer. If you do not provide either of these details, the model gets built the first time you call fit
, eval
, or predict
, or the first time you call the model on some input data. So the input shape will actually be inferred if you do not provide it. See the docs for more details. PyTorch generally infers the input shape at runtime.
def keras_mlp(size_in, size_out, act=layers.ReLU):
return keras.Sequential([layers.Input(shape=(size_in,)),
layers.Dense(hidden, name='layer1'),
act(),
layers.Dense(hidden, name='layer2'),
act(),
layers.Dense(hidden, name='layer3'),
act(),
layers.Dense(size_out, name='layer4')])
def pytorch_mlp(size_in, size_out, act=nn.ReLU):
return nn.Sequential(nn.Linear(size_in, hidden),
act(),
nn.Linear(hidden, hidden),
act(),
nn.Linear(hidden, hidden),
act(),
nn.Linear(hidden, size_out))
You can compare their summary.
For Keras:
>>> keras_mlp(10, 5).summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
layer1 (Dense) (None, 128) 1408
re_lu_6 (ReLU) (None, 128) 0
layer2 (Dense) (None, 128) 16512
re_lu_7 (ReLU) (None, 128) 0
layer3 (Dense) (None, 128) 16512
re_lu_8 (ReLU) (None, 128) 0
layer4 (Dense) (None, 5) 645
=================================================================
Total params: 35,077
Trainable params: 35,077
Non-trainable params: 0
_________________________________________________________________
For PyTorch:
>>> summary(pytorch_mlp(10, 5), (1,10))
============================================================================
Layer (type:depth-idx) Output Shape Param #
============================================================================
Sequential [1, 5] --
├─Linear: 1-1 [1, 128] 1,408
├─ReLU: 1-2 [1, 128] --
├─Linear: 1-3 [1, 128] 16,512
├─ReLU: 1-4 [1, 128] --
├─Linear: 1-5 [1, 128] 16,512
├─ReLU: 1-6 [1, 128] --
├─Linear: 1-7 [1, 5] 645
============================================================================
Total params: 35,077
Trainable params: 35,077
Non-trainable params: 0
Total mult-adds (M): 0.04
============================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.14
Estimated Total Size (MB): 0.14
============================================================================
Upvotes: 2