Reputation: 883
Say I have one batch that I want to train my model on. Do I simply run tf.Session()'s sess.run(batch) once, or do I have to iterate through all of the batch's examples with a loop in the session? I'm looking for the optimal way to iterate/update the training ops, such as loss. I thought tensorflow would handle it itself, especially in the cases where tf.nn.dynamic_rnn() takes in a batch dimension for listing the examples. I thought, perhaps naively, that a for loop in the python code would be the inefficient method of updating the loss. I am using tf.losses.mean_squared_error(batch) for a regression problem.
My regression problem is given two lists of word vectors (300d each), and determines the similarity between the two lists on a continuous scale from [0, 5]. My supervised model is Deepmind's Differential Neural Computer (DNC). The problem is I do not believe it is learning anything. this is due to the fact that the all of the output from the model is centered around 0 and even negative. I do not know how it could possibly be negative given no negative labels provided. I only call sess.run(loss) for the single batch, I do not create a python loop to iterate through it.
So, what is the most efficient way to iterate the training of a model and how do people go about it? Do they really use python loops to do multiple calls to sess.run(loss) (this was done in the training file example for DNC, and I have seen it in other examples as well). I am certain I get the final loss from the below process, but I am uncertain if the model has actually been trained entirely just because the loss was processed in one go. I also do not understand the point of update_ops returned by some functions, and am uncertain if they are necessary to ensure the model has been trained.
Example of what I mean by processing a batch's loss once:
# assume the model has been defined prior through batch_output_logits
train_loss = tf.losses.mean_squared_error(labels=target,
predictions=batch_output_logits)
with tf.Session() as sess:
sess.run(init_op) # pseudo code, unnecessary for question
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# is this the entire batch's loss && model has been trained for that batch?
loss_np = sess.run(train_step, train_loss)
coord.request_stop()
coord.join(threads)
Any input on why I am receiving negative values when the labels are in the range [0, 5] is welcomed as well(general abstract answers for this are fine, because its not the main focus). I am thinking of attempting to create a piece-wise function, if possible, for my loss, so that for any values out of bounds face a rapidly growing exponential loss function. Uncertain how to implement, or if it would even work.
Code is currently private. Once allowed, I will make the repo public.
To run DNC model, go to the project/
directory and run python -m src.main
. If there are errors you encounter feel free to let me know.
This model depends upon Tensorflow r1.2, most recent Sonnet, and NLTK's punkt for Tokenizing sentences in sts_handler.py and tests/*.
Upvotes: 0
Views: 472
Reputation: 759
In a regression model, the network calculates the model output based on the randomly initialized values for your model parameters. That's why you're seeing negative values here; you haven't trained your model enough for it to learn that your values are only between 0 and 5.
Unless I'm missing something, you are only calculating the loss, but you aren't actually training the model. You should probably be calling sess.run(optimizer)
on an optimizer, not on your loss function.
You probably need to train your model for multiple epochs (training your model for one epoch = training your model once on the entire dataset).
Batches are used because it is more computationally efficient to train your model on a batch than it is to train it on a single example. However, your data seems to be small enough that you won't have that problem. As such, I would recommend reducing your batch size to as low as possible. As a general rule, you get better training from a smaller batch size, at the cost of added computation.
If you post all of your code, I can take a look.
Upvotes: 1