Reputation: 31
Hello I have a question about Tensorflow. I have some LSTM models trained and I can access the weights and biases of the synaptic connections however I can't seem to access the input, new input, output and forget gate weights of the LSTM cell. I can get the gate tensors out but when I try to .eval() them in a Session I get errors. I'm using the class BasicLSTMCell found in tensorflow/python/ops/ for my network
class BasicLSTMCell(RNNCell):
"""Basic LSTM recurrent network cell.
The implementation is based on:
We add forget_bias (default: 1) to the biases of the forget gate in order to
reduce the scale of forgetting in the beginning of the training.
It does not allow cell clipping, a projection layer, and does not
use peep-hole connections: it is the basic baseline.
For advanced models, please use the full LSTMCell that follows.
def __init__(self, num_units, forget_bias=1.0, input_size=None,
state_is_tuple=True, activation=tanh):
"""Initialize the basic LSTM cell.
num_units: int, The number of units in the LSTM cell.
forget_bias: float, The bias added to forget gates (see above).
input_size: Deprecated and unused.
state_is_tuple: If True, accepted and returned states are 2-tuples of
the `c_state` and `m_state`. If False, they are concatenated
along the column axis. The latter behavior will soon be deprecated.
activation: Activation function of the inner states.
if not state_is_tuple:
logging.warn("%s: Using a concatenated state is slower and will soon be "
"deprecated. Use state_is_tuple=True.", self)
if input_size is not None:
logging.warn("%s: The input_size parameter is deprecated.", self)
self._num_units = num_units
self._forget_bias = forget_bias
self._state_is_tuple = state_is_tuple
self._activation = activation
def state_size(self):
return (LSTMStateTuple(self._num_units, self._num_units)
if self._state_is_tuple else 2 * self._num_units)
def output_size(self):
return self._num_units
def __call__(self, inputs, state, scope=None):
"""Long short-term memory cell (LSTM)."""
with vs.variable_scope(scope or type(self).__name__): # "BasicLSTMCell"
# Parameters of gates are concatenated into one multiply for efficiency.
if self._state_is_tuple:
c, h = state
c, h = array_ops.split(1, 2, state)
concat = _linear([inputs, h], 4 * self._num_units, True)
# i = input_gate, j = new_input, f = forget_gate, o = output_gate
i, j, f, o = array_ops.split(1, 4, concat)
new_c = (c * sigmoid(f + self._forget_bias) + sigmoid(i) *
new_h = self._activation(new_c) * sigmoid(o)
if self._state_is_tuple:
new_state = LSTMStateTuple(new_c, new_h)
new_state = array_ops.concat(1, [new_c, new_h])
return new_h, new_state
def _get_concat_variable(name, shape, dtype, num_shards):
"""Get a sharded variable concatenated into one tensor."""
sharded_variable = _get_sharded_variable(name, shape, dtype, num_shards)
if len(sharded_variable) == 1:
return sharded_variable[0]
concat_name = name + "/concat"
concat_full_name = vs.get_variable_scope().name + "/" + concat_name + ":0"
for value in ops.get_collection(ops.GraphKeys.CONCATENATED_VARIABLES):
if == concat_full_name:
return value
concat_variable = array_ops.concat(0, sharded_variable, name=concat_name)
return concat_variable
def _get_sharded_variable(name, shape, dtype, num_shards):
"""Get a list of sharded variables with the given dtype."""
if num_shards > shape[0]:
raise ValueError("Too many shards: shape=%s, num_shards=%d" %
(shape, num_shards))
unit_shard_size = int(math.floor(shape[0] / num_shards))
remaining_rows = shape[0] - unit_shard_size * num_shards
shards = []
for i in range(num_shards):
current_size = unit_shard_size
if i < remaining_rows:
current_size += 1
shards.append(vs.get_variable(name + "_%d" % i, [current_size] + shape[1:],
return shards
I can see the i, j, f, o gates being used in the def call however when I tf.print them I get tensors out, and when I try to .eval() them in a Session I get errors. I also tried tf.getVariable but was not able to extract the weight matrices. My question: is there a way to evaluate the i, j ,f and o gate weights/matrices?
Upvotes: 3
Views: 2005
Reputation: 1435
First, to clear some confusion: i, j, f and o tensors are not weight matrices; they are intermediate calculation steps that depend on particular LSTM cell input. All the weights of the LSTM cell are stored in variables self._kernel and self._bias, and in a constant self._forget_bias.
So, to answer both possible interpretations of your question, I'll show how print the values of the self._kernel and self._bias, and the values of i, j, f and o tensors at every step.
Suppose we have the following graph:
import numpy as np
import tensorflow as tf
timesteps = 7
num_input = 4
num_units = 3
x_val = np.random.normal(size=(1, timesteps, num_input))
lstm = tf.nn.rnn_cell.BasicLSTMCell(num_units = num_units)
X = tf.placeholder("float", [1, timesteps, num_input])
inputs = tf.unstack(X, timesteps, 1)
outputs, state = tf.contrib.rnn.static_rnn(lstm, inputs, dtype=tf.float32)
We can find the value of any tensor if we know its name. One way to find a tensor's name is to look at TensorBoard.
init = tf.global_variables_initializer()
graph = tf.get_default_graph()
with tf.Session(graph=graph) as sess:
train_writer = tf.summary.FileWriter('./graph', sess.graph)
Now we can start TensorBoard by the terminal command
tensorboard --logdir=graph --host=localhost
and find that the operation which produces i, j, f, o tensors has name 'rnn/basic_lstm_cell/split', while kernel and bias are called 'rnn/basic_lstm_cell/kernel' and 'rnn/basic_lstm_cell/bias':
The tf.contrib.rnn.static_rnn function calls our basic lstm cell 7 times, once for every timestep. When Tensorflow is asked to create several operations under the same name, it adds suffixes to them, like this: rnn/basic_lstm_cell/split, rnn/basic_lstm_cell/split_1, ..., rnn/basic_lstm_cell/split_6. These are the names of our operations.
The name of a tensor in tensorflow consists of the name of the operation that produces the tensor, followed by a colon, followed by the index of the operation's output that produces this tensor. Kernel and bias ops have a single output, so the tensor names will be
kernel = graph.get_tensor_by_name("rnn/basic_lstm_cell/kernel:0")
bias = graph.get_tensor_by_name("rnn/basic_lstm_cell/bias:0")
The split operation produces four outputs: i, j, f and o, so these tensors' names will be:
i_list = []
j_list = []
f_list = []
o_list = []
for suffix in ["", "_1", "_2", "_3", "_4", "_5", "_6"]:
and now we can find the values of all tensors:
with tf.Session(graph=graph) as sess:
train_writer = tf.summary.FileWriter('./graph', sess.graph)
weights =[kernel, bias])
print("Weights:\n", weights)
i_values, j_values, f_values, o_values =[i_list, j_list, f_list, o_list],
feed_dict = {X:x_val})
print("i values:\n", i_values)
print("j values:\n", j_values)
print("f_values:\n", f_values)
print("o_values:\n", o_values)
Alternatively, we could find the tensor names by looking at the list of all tensors in a graph, which can be produced by:
tensors_per_node = [node.values() for node in graph.get_operations()]
tensor_names = [ for tensors in tensors_per_node for tensor in tensors]
Or, for a shorter list of all operations:
print([ for node in graph.get_operations()])
The third way is to read the source code and find which names are assigned to which tensors.
Upvotes: 2