Olric
Olric

Reputation: 338

How to make this 1 cell LSTM network?

I would like to make an LSTM network to learn to give me back the first value of the sequence each time there is a 0 in the sequence and 0 if there is another value.

Example:

x = 9 8 3 1 0 3 4
y = 0 0 0 0 9 0 0

The network memorize a value and give it back when it receives a special signal.

I think a can do that with one LSTM cell like that:

enter image description here

in red the weights and inside the gray area the biases.

Here is my model:

model2=Sequential()
model2.add(LSTM(input_dim=1, output_dim=1, return_sequences = True))
model2.add(TimeDistributed(Dense(output_dim=1, activation='linear')))
model2.compile(loss = "mse", optimizer = "rmsprop")

and here how I set the weigths to my cell, however I not sure at all of the order :

# w : weights of x_t
# u : weights of h_{t-1}
# order of array: input_gate, new_input, forget_gate, output_gate 
#                 (Tensorflow order)

w = np.array([[0, 1, 0, -100]], dtype=np.float32)
u = np.array([[1, 0, 0, 0]], dtype=np.float32)
biases = np.array([0, 0, 1, 1], dtype=np.float32)
model2.get_layer('lstm').set_weights([w, u, biases])

Am I right with the weigths? Is it as I put it on the figure?

To work it needs to have the right inital values. How do I set the initial values c of the cell and h of the previous output ? I seen that in the source code

    h_tm1 = states[0]  # previous memory state
    c_tm1 = states[1]  # previous carry state

but I couldn't find how to use that.

Upvotes: 1

Views: 135

Answers (1)

Daniel Möller
Daniel Möller

Reputation: 86630

Why not do this manually? It's so easy and it's an exact calculation. You don't need weights for that, and that is certainly not differentiable regarding weights.

Given an input tensor with shape (batch, steps, features):

def processSequence(x):

    initial = x[:,0:1]
    zeros = K.cast(K.equal(x,0), K.floatx())

    return initial * zeros

model.add(Lambda(processSequence))

Warning: if you're intending to use this with inputs from other layers, the probability of finding a zero will be so small that this layer will be useless.

Upvotes: 1

Related Questions