Reputation: 345
I am trying to import a pretrained Model from tensorflow to PyTorch. It takes a single input and maps it onto a single output. Confusion comes up, when I try to import the LSTM weights
I read the weights and their variables from the file with the following function:
def load_tf_model_weights():
modelpath = 'models/model1.ckpt.meta'
with tf.Session() as sess:
tf.train.import_meta_graph(modelpath)
init = tf.global_variables_initializer()
sess.run(init)
vars = tf.trainable_variables()
W = sess.run(vars)
return W,vars
W,V = load_tf_model_weights()
Then I am inspecting the shapes of the weights
In [33]: [w.shape for w in W]
Out[33]: [(51, 200), (200,), (100, 200), (200,), (50, 1), (1,)]
furthermore the variables are defined as
In [34]: V
Out[34]:
[<tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0' shape=(51, 200) dtype=float32_ref>,
<tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/bias:0' shape=(200,) dtype=float32_ref>,
<tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/kernel:0' shape=(100, 200) dtype=float32_ref>,
<tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/bias:0' shape=(200,) dtype=float32_ref>,
<tf.Variable 'weight:0' shape=(50, 1) dtype=float32_ref>,
<tf.Variable 'FCLayer/Variable:0' shape=(1,) dtype=float32_ref>]
So I can say that the first element of W
defines the Kernel of an LSTM and the second element define its bias. According to this post, the shape for the Kernel is defined as
[input_depth + h_depth, 4 * self._num_units]
and the bias as [4 * self._num_units]
. We already know that input_depth
is 1
. So we get, that h_depth
and _num_units
both have the value 50
.
In pytorch my LSTMCell, to which I want to assign the weights, looks like this:
In [38]: cell = nn.LSTMCell(1,50)
In [39]: [p.shape for p in cell.parameters()]
Out[39]:
[torch.Size([200, 1]),
torch.Size([200, 50]),
torch.Size([200]),
torch.Size([200])]
The first two entries can be covered by the first value of W
which has the shape (51,200)
. But the LSTMCell from Tensorflow yields only one bias of shape (200)
while pytorch wants two of them
And by leaving the bias out I have weights left over:
cell2 = nn.LSTMCell(1,50,bias=False)
[p.shape for p in cell2.parameters()]
Out[43]: [torch.Size([200, 1]), torch.Size([200, 50])]
Thanks!
Upvotes: 2
Views: 836
Reputation: 1296
pytorch uses CuDNN's LSTM underlayer(even when you don't have CUDA, it still uses something compatible) thus it has one extra bias term.
So you can pick two numbers with their sum equal to 1(0 and 1, 1/2 and 1/2 or anything else) and set your pytorch biases as those numbers times TF's bias.
pytorch_bias_1 = torch.from_numpy(alpha * tf_bias_data)
pytorch_bias_2 = torch.from_numpy((1.0-alpha) * tf_bias_data)
Upvotes: 1