Tensorflow/LSTM machanism: How to specify the previous output of first time step of LSTM cells

Question

Just started using TensorFlow to build LSTM networks for multiclass classification

Given the structure shown below: A RNN model Let's Assume each node A represents TensorFlow BasicLSTMcell.

According to some popular examples found online, the input for training is prepared as [batch_size, timeStep_size, feature_size]

let's Assume timeStep_size = 5, feature_size = 2, num_class = 4, given one training set : (dummy data)

t =     t0  t1  t2  t3  t4

x =   [ [1] [2] [2] [5] [2] ]
      [ [2] [3] [3] [1] [2] ]

y =   [ [0] [1] [1] [0] [0] ]
      [ [1] [0] [0] [0] [0] ]
      [ [0] [0] [0] [0] [1] ]
      [ [0] [0] [0] [1] [0] ]

According to the popular usage:

...
# 1-layer LSTM with n_hidden units.
rnn_cell = rnn.BasicLSTMCell(n_hidden)

# generate prediction
outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32)

return tf.matmul(outputs[-1], weights['out']) + biases['out']

It seems to me that the training of LSTM cell doesn't make use of all the five outputs of y (y at t0 - t3). Only y at time t4 is used for calculating the loss when compared to output[-1].

Question 1: is it the case that LSTM calculates/approximates y_t0 by itself, and feed into t1 to calculate y_t1, and so on... until it y_t4 is calculated?

If this is the case,

Question 2: what if y at t-1 is very important?

Example:

t =     t-1 t0  t1  t2  t3  t4

x =   [ [1] [2] [2] [2] [2] [2]]
      [ [1] [2] [2] [2] [2] [2]]

y =   [ [0] [1] [1] [1] [1] [1]]
      [ [1] [0] [0] [0] [0] [0]]
      [ [0] [0] [0] [0] [0] [0]]
      [ [0] [0] [0] [0] [0] [0]]

VS:

t =     t-1 t0  t1  t2  t3  t4

x =   [ [3] [2] [2] [2] [2] [2]]
      [ [3] [2] [2] [2] [2] [2]]

y =   [ [0] [0] [0] [0] [0] [0]]
      [ [0] [0] [0] [0] [0] [0]]
      [ [1] [0] [0] [0] [0] [0]]
      [ [0] [1] [1] [1] [1] [1]]

Which means that even though the input features from t0 to t4 are same, the output y are different since the previous outputs (y_t-1) are different.

Then how to deal with this kind of situation? how does TensorFlow set the output for t-1, when calculating the output at t0?

I've thought about increasing the timeStep_Size, but the real case might be very large, so I'm a bit confused...

Any pointers are highly appreciated!

Thank You in advance.

================= UPDATE ===============================

Re: jdehesa, Thanks Again.

Some additional background: my intention is to classify a long series of x, like below:

t =     t0  t1  t2  t3  t4  t5  t6  t7  ...
x =   [ [3] [2] [2] [2] [2] [2] [1] [2] [2] [2] [2] [2] ...]
      [ [3] [2] [2] [2] [2] [2] [1] [2] [2] [2] [2] [2] ...]

y =   [ c3  c2  c2  c2  c2  c2  c1  c4  c4  c4  c4  c4  ...]
Note: c1: class 1, c2: class2 c3: class 3, c4: class 4

The main confusion behind this post is that there are some known rules for manual classification. Take the dummy data above for example, assume there are rules that

if previous feature x is class 3 ([3, 3]), then all following [2, 2] will be class 2 until it reaches class 1.
if previous x is class 1 ([1, 1]), then all following [2, 2] will be class 4 until it reaches class 3.

In such case, if the LSTM only sees [5 by 2] feature vector (x) same as t1 to t4, the network will completely lost in wheather classify as class 2 or class 4. So what i mean is that not only do those features of the 5 time steps matter, so does the output/label of previous time step.

So restate the question: if now the training set is t1 to t5, so in addition to x [batch_size, t1:t5, 2], how to involve the label/class y at t0 as well.

Below are my reponse to your answer.

consider i use GRU instead of LSTM, where cell output and cell state are all represented by "h" as in understandign LSTM.

About the initial_state parameter: I just found that the dynamic_rnn and static_rnn take this parameter as you pointed out :D. if i were to solve the problem mentioned just now, can i use assign the previous class/label (y at t0) to initial_state param to before training, instead of using zeros_state.
I suddenly feel like i'm totally lost about the time span of LSTM memory. I've been thinking the time span of memory is limited by timeStep_size only. if timeStep_size = 5, the network can only recall up to 4 steps back, since every training we only feed [5 x 2] of x feature vector. please correct me if i'm wrong.

Again thank you so much

Tensorflow/LSTM machanism: How to specify the previous output of first time step of LSTM cells

Answers (1)

Related Questions