Reputation: 155
I am new to TensorFlow RNN prediction. I am trying to use RNN with BasicLSTMCell to predict sequence, such as
1,2,3,4,5 ->6
3,4,5,6,7 ->8
35,36,37,38,39 ->40
My code doesn't report error, but outputs for every batch seem to be the same, and the cost seem to not reduce while training.
When I divided all training data by 100
0.01,0.02,0.03,0.04,0.05 ->0.06
0.03,0.04,0.05,0.06,0.07 ->0.08
0.35,0.36,0.37,0.38,0.39 ->0.40
The result is pretty good, the correlation between prediction and real values is very high (0.9998).
I suspect the problem is because integer and float? but I cannot explain the reason. Anyone can help? Many thanks!!
Here is the code
library(tensorflow)
start=sample(1:1000, 100000, T)
start1= start+1
start2=start1+1
start3= start2+1
start4=start3+1
start5= start4+1
start6=start5+1
label=start6+1
data=data.frame(start, start1, start2, start3, start4, start5, start6, label)
data=as.matrix(data)
n = nrow(data)
trainIndex = sample(1:n, size = round(0.7*n), replace=FALSE)
train = data[trainIndex ,]
test = data[-trainIndex ,]
train_data= train[,1:7]
train_label= train[,8]
means=apply(train_data, 2, mean)
sds= apply(train_data, 2, sd)
train_data=(train_data-means)/sds
test_data=test[,1:7]
test_data=(test_data-means)/sds
test_label=test[,8]
batch_size = 50L
n_inputs = 1L # MNIST data input (img shape: 28*28)
n_steps = 7L # time steps
n_hidden_units = 10L # neurons in hidden layer
n_outputs = 1L # MNIST classes (0-9 digits)
x = tf$placeholder(tf$float32, shape(NULL, n_steps, n_inputs))
y = tf$placeholder(tf$float32, shape(NULL, 1L))
weights_in= tf$Variable(tf$random_normal(shape(n_inputs, n_hidden_units)))
weights_out= tf$Variable(tf$random_normal(shape(n_hidden_units, 1L)))
biases_in=tf$Variable(tf$constant(0.1, shape= shape(n_hidden_units )))
biases_out = tf$Variable(tf$constant(0.1, shape=shape(1L)))
RNN=function(X, weights_in, weights_out, biases_in, biases_out)
{
X = tf$reshape(X, shape=shape(-1, n_inputs))
X_in = tf$sigmoid (tf$matmul(X, weights_in) + biases_in)
X_in = tf$reshape(X_in, shape=shape(-1, n_steps, n_hidden_units)
lstm_cell = tf$contrib$rnn$BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=T)
init_state = lstm_cell$zero_state(batch_size, dtype=tf$float32)
outputs_final_state = tf$nn$dynamic_rnn(lstm_cell, X_in, initial_state=init_state, time_major=F)
outputs= tf$unstack(tf$transpose(outputs_final_state[[1]], shape(1,0,2)))
results = tf$matmul(outputs[[length(outputs)]], weights_out) + biases_out
return(results)
}
pred = RNN(x, weights_in, weights_out, biases_in, biases_out)
cost = tf$losses$mean_squared_error(pred, y)
train_op = tf$contrib$layers$optimize_loss(loss=cost, global_step=tf$contrib$framework$get_global_step(), learning_rate=0.05, optimizer="SGD")
init <- tf$global_variables_initializer()
sess <- tf$Session()
sess.run(init)
step = 0
while (step < 1000)
{
train_data2= train_data[(step*batch_size+1) : (step*batch_size+batch_size) , ]
train_label2=train_label[(step*batch_size+1):(step*batch_size+batch_size)]
batch_xs <- sess$run(tf$reshape(train_data2, shape(batch_size, n_steps, n_inputs))) # Reshape
batch_ys= matrix(train_label2, ncol=1)
sess$run(train_op, feed_dict = dict(x = batch_xs, y= batch_ys))
mycost <- sess$run(cost, feed_dict = dict(x = batch_xs, y= batch_ys))
print (mycost)
test_data2= test_data[(0*batch_size+1) : (0*batch_size+batch_size) , ]
test_label2=test_label[(0*batch_size+1):(0*batch_size+batch_size)]
batch_xs <- sess$run(tf$reshape(test_data2, shape(batch_size, n_steps, n_inputs))) # Reshape
batch_ys= matrix(test_label2, ncol=1)
step=step+1
}
Upvotes: 0
Views: 153
Reputation: 2594
First, it's quite useful to always normalize your network inputs (there are different approaches, divide by a maximum value, subtract mean and divide by std and many more). This will help your optimizer a lot.
Second, and actually most important in your case, after the RNN output you are applying sigmoid function. If you check the plot of the sigmoid function, you will see that it actually scales all inputs to the range (0,1). So basically no matter how big your inputs are your output will always be at most 1. Thus you should not use any activation functions at the output layer in regression problems.
Hope it helps.
Upvotes: 1