dba
dba

Reputation: 345

Import LSTM from Tensorflow to PyTorch by hand

I am trying to import a pretrained Model from tensorflow to PyTorch. It takes a single input and maps it onto a single output. Confusion comes up, when I try to import the LSTM weights

I read the weights and their variables from the file with the following function:

def load_tf_model_weights():        

    modelpath = 'models/model1.ckpt.meta'

    with tf.Session() as sess:        
        tf.train.import_meta_graph(modelpath) 
        init = tf.global_variables_initializer()
        sess.run(init)  
        vars = tf.trainable_variables()        
        W = sess.run(vars)

    return W,vars

W,V = load_tf_model_weights()

Then I am inspecting the shapes of the weights

In [33]:  [w.shape for w in W]
Out[33]: [(51, 200), (200,), (100, 200), (200,), (50, 1), (1,)]

furthermore the variables are defined as

In [34]:    V
Out[34]: 
[<tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0' shape=(51, 200) dtype=float32_ref>,
<tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/bias:0' shape=(200,) dtype=float32_ref>,
<tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/kernel:0' shape=(100, 200) dtype=float32_ref>,
<tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/bias:0' shape=(200,) dtype=float32_ref>,
<tf.Variable 'weight:0' shape=(50, 1) dtype=float32_ref>,
<tf.Variable 'FCLayer/Variable:0' shape=(1,) dtype=float32_ref>]

So I can say that the first element of W defines the Kernel of an LSTM and the second element define its bias. According to this post, the shape for the Kernel is defined as [input_depth + h_depth, 4 * self._num_units] and the bias as [4 * self._num_units]. We already know that input_depth is 1. So we get, that h_depth and _num_units both have the value 50.

In pytorch my LSTMCell, to which I want to assign the weights, looks like this:

In [38]: cell = nn.LSTMCell(1,50)
In [39]: [p.shape for p in cell.parameters()]
Out[39]: 
[torch.Size([200, 1]),
torch.Size([200, 50]),
torch.Size([200]),
torch.Size([200])]

The first two entries can be covered by the first value of W which has the shape (51,200). But the LSTMCell from Tensorflow yields only one bias of shape (200) while pytorch wants two of them

And by leaving the bias out I have weights left over:

cell2 = nn.LSTMCell(1,50,bias=False)
[p.shape for p in cell2.parameters()]
Out[43]: [torch.Size([200, 1]), torch.Size([200, 50])]

Thanks!

Upvotes: 2

Views: 836

Answers (1)

Separius
Separius

Reputation: 1296

pytorch uses CuDNN's LSTM underlayer(even when you don't have CUDA, it still uses something compatible) thus it has one extra bias term.

So you can pick two numbers with their sum equal to 1(0 and 1, 1/2 and 1/2 or anything else) and set your pytorch biases as those numbers times TF's bias.

pytorch_bias_1 = torch.from_numpy(alpha * tf_bias_data)
pytorch_bias_2 = torch.from_numpy((1.0-alpha) * tf_bias_data)

Upvotes: 1

Related Questions