386er
386er

Reputation: 87

LSTM-RNN : How to shape multivariate Inputs

Hi everybody I am struggeling with the tensorflow RNN implementation:

The problem:

I want to train an LSTM implentation of an RNN to detect malicious connections in the KDD99 dataset. Its a dataset with 41 features and (after some preprocessing) a label vector of the size 5.

[ 
[x1, x2, x3, .....x40, x41],
... 
[x1, x2, x3, .....x40, x41]
]


[ 
[0, 1, 0, 0, 0],
...
[0, 0, 1, 0, 0]
]

As a basic architurecture I would like to implement the following:

cell = tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)
cell = tf.nn.rnn_cell.DropoutWrapper(cell=cell, output_keep_prob=0.5)
cell = tf.nn.rnn_cell.MultiRNNCell(cells=[cell] * 3, state_is_tuple=True)

My question is: In order to feed it to the model, how would i need to reshape the input features?

Would I not just have to reshape the input features, but to build sliding window sequences?

What I mean by that:

Assuming a sequence length of ten, the first suqence would contains data point 0 - 9, the second one contains data points 1 - 10, 2 - 11 and so on.

Thanks!

Upvotes: 0

Views: 687

Answers (1)

Giuseppe Marra
Giuseppe Marra

Reputation: 1104

I do not know the dataset but I think that you problem is the following: you have a very long sequence and you want to know how to shape this sequence in order to provide this to the network.

The 'tf.contrib.rnn.static_rnn' has the following signature:

tf.contrib.rnn.static_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None)

where

inputs: A length T list of inputs, each a Tensor of shape [batch_size, input_size], or a nested tuple of such elements.

So the inputs need to be shaped into lists, where each element of the list is the element of the input sequence at each time step.

The length of this list depend on your problem and/or on computational issues.

  • In Natural Language Processing, for example, the length of this list can be the maximum sentence length of your document, where shorter sentences are padded to that length. As in this case, in many domains the length of the sequence is driven by the problem
  • However, you can have no such evidences in your problem or still having a long sequence. Long sequences are very heavy from a computational point of view. The BPTT algorithm, used to optimize this models, "unfolds" the recurrent network in a very deep feedforward network with shared parameters and back propagates over it. In this cases, it is still convenient to "cut" the sequence to a fixed length.

And here we arrive at your question, given this fixed length, let us say 10, how do I shape my input?

Usually, what is done is to cut the dataset in non overlapping windows (in your example, we will have 1-9, 10-19, 20-29, etc. What happens here is that the network only looks a the last 10 elements of the sequence each time it updates the weights with BPTT.

However, since the sequence has been arbitrarily cut, it is likely that predictions need to exploit evidences that are far back in the sequence, outside the current window. To do this, we initialize the initial state of the RNN at window i with the final state of the window i-1 using the parameter:

initial_state: (optional) An initial state for the RNN.

Finally, I give you two sources to go into more details:

  1. RNN Tutorial This is the official tutorial of tensorflow. It is applied to the task of Language Modeling. At a certain point of the code, you will see that the final state is fed to the network from one run to the following one, in order to implement what said above.

    feed_dict = {}
    for i, (c, h) in enumerate(model.initial_state):
      feed_dict[c] = state[i].c
      feed_dict[h] = state[i].h
    
  2. DevSummit 2017 This is a video of a talk during the Tensorflow DevSummit 2017 where, in the first section (Reading and Batching Sequence Data), it is explained how and using which functions you should shape your sequence inputs.

Hope this helps :)

Upvotes: 2

Related Questions