AlexR
AlexR

Reputation: 103

How do I share weights across different RNN cells that feed in different inputs in Tensorflow?

I'm curious if there is a good way to share weights across different RNN cells while still feeding each cell different inputs.

The graph that I am trying to build is like this:

enter image description here

where there are three LSTM Cells in orange which operate in parallel and between which I would like to share the weights.

I've managed to implement something similar to what I want using a placeholder (see below for code). However, using a placeholder breaks the gradient calculations of the optimizer and doesn't train anything past the point where I use the placeholder. Is it possible to do this a better way in Tensorflow?

I'm using Tensorflow 1.2 and python 3.5 in an Anaconda environment on Windows 7.

Code:

def ann_model(cls,data, act=tf.nn.relu):
    with tf.name_scope('ANN'):
        with tf.name_scope('ann_weights'):
            ann_weights = tf.Variable(tf.random_normal([1,
                                                        cls.n_ann_nodes]))
        with tf.name_scope('ann_bias'):
            ann_biases = tf.Variable(tf.random_normal([1]))
        out = act(tf.matmul(data,ann_weights) + ann_biases)
    return out

def rnn_lower_model(cls,data):
    with tf.name_scope('RNN_Model'):
        data_tens = tf.split(data, cls.sequence_length,1)
        for i in range(len(data_tens)):
            data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size,
                                                     cls.n_rnn_inputs])

        rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower)

        outputs, states = tf.contrib.rnn.static_rnn(rnn_cell,
                                                    data_tens,
                                                    dtype=tf.float32)

        with tf.name_scope('RNN_out_weights'):
            out_weights = tf.Variable(
                    tf.random_normal([cls.n_rnn_nodes_lower,1]))
        with tf.name_scope('RNN_out_biases'):
            out_biases = tf.Variable(tf.random_normal([1]))

        #Encode the output of the RNN into one estimate per entry in 
        #the input sequence
        predict_list = []
        for i in range(cls.sequence_length):
            predict_list.append(tf.matmul(outputs[i],
                                          out_weights) 
                                          + out_biases)
    return predict_list

def create_graph(cls,sess):
    #Initializes the graph
    with tf.name_scope('input'):
        cls.x = tf.placeholder('float',[cls.batch_size,
                                       cls.sequence_length,
                                       cls.n_inputs])
    with tf.name_scope('labels'):
        cls.y = tf.placeholder('float',[cls.batch_size,1])
    with tf.name_scope('community_id'):
        cls.c = tf.placeholder('float',[cls.batch_size,1])

    #Define Placeholder to provide variable input into the 
    #RNNs with shared weights    
    cls.input_place = tf.placeholder('float',[cls.batch_size,
                                              cls.sequence_length,
                                              cls.n_rnn_inputs])

    #global step used in optimizer
    global_step = tf.Variable(0,trainable = False)

    #Create ANN
    ann_output = cls.ann_model(cls.c)
    #Combine output of ANN with other input data x
    ann_out_seq = tf.reshape(tf.concat([ann_output for _ in 
                                            range(cls.sequence_length)],1),
                            [cls.batch_size,
                             cls.sequence_length,
                             cls.n_ann_nodes])
    cls.rnn_input = tf.concat([ann_out_seq,cls.x],2)

    #Create 'unrolled' RNN by creating sequence_length many RNN Cells that
    #share the same weights.
    with tf.variable_scope('Lower_RNNs'):
        #Create RNNs
        daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2

When training mini-batches are calculated in two steps:

RNNinput = sess.run(cls.rnn_input,feed_dict = {
                                            cls.x:batch_x,
                                            cls.y:batch_y,
                                            cls.c:batch_c})
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput,
                                       cls.y:batch_y,
                                       cls.x:batch_x,
                                       cls.c:batch_c})

Thanks for your help. Any ideas would be appreciated.

Upvotes: 4

Views: 2652

Answers (2)

Vijay Mariappan
Vijay Mariappan

Reputation: 17201

You have 3 different inputs : input_1, input_2, input_3 fed it to a LSTM model which has the parameters shared. And then you concatenate the outputs of the 3 lstm and pass it to a final LSTM layer. The code should look something like this:

 # Create input placeholder for the network
 input_1 = tf.placeholder(...)
 input_2 = tf.placeholder(...)
 input_3 = tf.placeholder(...)

 # create a shared rnn layer 
 def shared_rnn(...):
    ...
    rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(...)

 # generate the outputs for each input
 with tf.variable_scope('lower_lstm') as scope:
    out_input_1 = shared_rnn(...)
    scope.reuse_variables() # the variables will be reused.
    out_input_2 = shared_rnn(...)
     scope.reuse_variables()
    out_input_3 = shared_rnn(...)

 # verify whether the variables are reused
 for v in tf.global_variables():
    print(v.name)

 # concat the three outputs
 output = tf.concat...  

 # Pass it to the final_lstm layer and out the logits
 logits = final_layer(output, ...)

 train_op = ...

 # train
   sess.run(train_op, feed_dict{input_1: in1, input_2: in2, input_3:in3, labels: ...}

Upvotes: 2

AlexR
AlexR

Reputation: 103

I ended up rethinking my architecture a little and came up with a more workable solution.

Instead of duplicating the middle layer of LSTM cells to create three different cells with the same weights, I chose to run the same cell three times. The results of each run were stored in a 'buffer' like tf.Variable, and then that whole variable was used as an input into the final LSTM layer. I drew a diagram here

Implementing it this way allowed for valid outputs after 3 time steps, and didn't break tensorflows backpropagation algorithm (i.e. The nodes in the ANN could still train.)

The only tricky thing was to make sure that the buffer was in the correct sequential order for the final RNN.

Upvotes: 0

Related Questions