user2146141
user2146141

Reputation: 155

RNN sequence learning

I am new to TensorFlow RNN prediction. I am trying to use RNN with BasicLSTMCell to predict sequence, such as

1,2,3,4,5 ->6
3,4,5,6,7 ->8
35,36,37,38,39 ->40

My code doesn't report error, but outputs for every batch seem to be the same, and the cost seem to not reduce while training.

When I divided all training data by 100

0.01,0.02,0.03,0.04,0.05 ->0.06
0.03,0.04,0.05,0.06,0.07 ->0.08 
0.35,0.36,0.37,0.38,0.39 ->0.40

The result is pretty good, the correlation between prediction and real values is very high (0.9998).

I suspect the problem is because integer and float? but I cannot explain the reason. Anyone can help? Many thanks!!

Here is the code

library(tensorflow)
start=sample(1:1000, 100000, T)
start1= start+1
start2=start1+1
start3= start2+1
start4=start3+1
start5= start4+1
start6=start5+1
label=start6+1
data=data.frame(start, start1, start2, start3, start4, start5, start6, label)
data=as.matrix(data)
n = nrow(data)
trainIndex = sample(1:n, size = round(0.7*n), replace=FALSE)
train = data[trainIndex ,]
test = data[-trainIndex ,]
train_data= train[,1:7]
train_label= train[,8]
means=apply(train_data, 2, mean)
sds= apply(train_data, 2, sd)
train_data=(train_data-means)/sds
test_data=test[,1:7]
test_data=(test_data-means)/sds
test_label=test[,8]
batch_size = 50L            
n_inputs = 1L               # MNIST data input (img shape: 28*28)
n_steps = 7L                # time steps
n_hidden_units = 10L        # neurons in hidden layer
n_outputs = 1L             # MNIST classes (0-9 digits)
x = tf$placeholder(tf$float32, shape(NULL, n_steps, n_inputs))
y = tf$placeholder(tf$float32, shape(NULL, 1L))
weights_in= tf$Variable(tf$random_normal(shape(n_inputs, n_hidden_units)))
weights_out= tf$Variable(tf$random_normal(shape(n_hidden_units, 1L)))
biases_in=tf$Variable(tf$constant(0.1, shape= shape(n_hidden_units )))
biases_out = tf$Variable(tf$constant(0.1, shape=shape(1L)))
RNN=function(X, weights_in, weights_out, biases_in, biases_out)
{
    X = tf$reshape(X, shape=shape(-1, n_inputs))
    X_in = tf$sigmoid (tf$matmul(X, weights_in) + biases_in)
    X_in = tf$reshape(X_in, shape=shape(-1, n_steps, n_hidden_units)
    lstm_cell = tf$contrib$rnn$BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=T)
    init_state = lstm_cell$zero_state(batch_size, dtype=tf$float32)
    outputs_final_state = tf$nn$dynamic_rnn(lstm_cell, X_in, initial_state=init_state, time_major=F)
    outputs= tf$unstack(tf$transpose(outputs_final_state[[1]], shape(1,0,2)))
    results =  tf$matmul(outputs[[length(outputs)]], weights_out) + biases_out
    return(results)
}
pred = RNN(x, weights_in, weights_out, biases_in, biases_out)
cost = tf$losses$mean_squared_error(pred, y)
train_op = tf$contrib$layers$optimize_loss(loss=cost, global_step=tf$contrib$framework$get_global_step(), learning_rate=0.05, optimizer="SGD")
init <- tf$global_variables_initializer()
sess <- tf$Session()
sess.run(init)
    step = 0
while (step < 1000)
{
  train_data2= train_data[(step*batch_size+1) : (step*batch_size+batch_size) ,  ]
  train_label2=train_label[(step*batch_size+1):(step*batch_size+batch_size)]
  batch_xs <- sess$run(tf$reshape(train_data2, shape(batch_size, n_steps, n_inputs))) # Reshape
  batch_ys= matrix(train_label2, ncol=1)
  sess$run(train_op, feed_dict = dict(x = batch_xs, y= batch_ys))
 mycost <- sess$run(cost, feed_dict = dict(x = batch_xs, y= batch_ys))
print (mycost)
 test_data2= test_data[(0*batch_size+1) : (0*batch_size+batch_size) ,  ]
  test_label2=test_label[(0*batch_size+1):(0*batch_size+batch_size)]
   batch_xs <- sess$run(tf$reshape(test_data2, shape(batch_size, n_steps, n_inputs))) # Reshape
  batch_ys= matrix(test_label2, ncol=1)
step=step+1
}

Upvotes: 0

Views: 153

Answers (1)

asakryukin
asakryukin

Reputation: 2594

First, it's quite useful to always normalize your network inputs (there are different approaches, divide by a maximum value, subtract mean and divide by std and many more). This will help your optimizer a lot.

Second, and actually most important in your case, after the RNN output you are applying sigmoid function. If you check the plot of the sigmoid function, you will see that it actually scales all inputs to the range (0,1). So basically no matter how big your inputs are your output will always be at most 1. Thus you should not use any activation functions at the output layer in regression problems.

Hope it helps.

Upvotes: 1

Related Questions