Reputation: 6197
I thought that Tensorflow saver will save all variables as stated here
If you do not pass any arguments to tf.train.Saver(), the saver handles all variables in the graph. Each variable is saved under the name that was passed when the variable was created.
https://www.tensorflow.org/programmers_guide/saved_model
However, the variable epochCount in my code below does not seem to get saved. This variable is used to keep track of the total epoches the model has trained over the data.
When I restore a graph it resets to it's initializer value, not the value it was when it the check point was last saved.
It appears to me that it's only saving variables used in calculating the loss.
Here's my code.
This is where I declare my graph:
graph = tf.Graph()
with graph.as_default():
valid_examples = np.array(random.sample(range(1, valid_window), valid_size)) #put inside graph to get new words each time
train_dataset = tf.placeholder(tf.int32, shape=[batch_size, cbow_window*2 ])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
valid_datasetSM = tf.constant(valid_examples, dtype=tf.int32)
epochCount = tf.get_variable( 'epochCount', initializer= 0) #to store epoch count to total # of epochs are known
embeddings = tf.get_variable( 'embeddings',
initializer= tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
softmax_weights = tf.get_variable( 'softmax_weights',
initializer= tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / math.sqrt(embedding_size)))
softmax_biases = tf.get_variable('softmax_biases',
initializer= tf.zeros([vocabulary_size]), trainable=False )
embed = tf.nn.embedding_lookup(embeddings, train_dataset) #train data set is
embed_reshaped = tf.reshape( embed, [batch_size*cbow_window*2, embedding_size] )
segments= np.arange(batch_size).repeat(cbow_window*2)
averaged_embeds = tf.segment_mean(embed_reshaped, segments, name=None)
loss = tf.reduce_mean(
tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=averaged_embeds,
labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))
optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss) #Original learning rate was 1.0
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keepdims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(
normalized_embeddings, valid_dataset)
similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))
saver = tf.train.Saver()
If I restore the graph from checkpoint, the embeddings, and softmax_biases are restored, but epochCount is reset to its initializer value. (Note that I am not calling the tf.global_variables_initializer().run() line, which is a common cause of variables mistakenly being reset after a checkpoint has been restored)
Here is the code where the graph is run
num_steps = 1000001
with tf.Session(graph=graph) as session:
saver.restore(session, './checkpointsBook2VecCbowWindow2Downloaded/bookVec.ckpt' )
average_loss = 0
saveIteration = 1
for step in range(1, num_steps):
batch_data, batch_labels = generate_batch(
batch_size, cbow_window)
feed_dict = {train_dataset : batch_data, train_labels : batch_labels}
_, l = session.run([optimizer, loss], feed_dict=feed_dict)
if step % 20000 == 0:
recEpoch_indexA = epoch_index - recEpoch_indexA
epochCount = tf.add( epochCount, recEpoch_indexA, name=None )
recEpoch_indexA = epoch_index
save_path = saver.save(session, "checkpointsBook2VecCbowWindow2/bookVec.ckpt")
chptName = 'B2VCbowW2Embed256ckpt'+str(saveIteration)
zipfolder(chptName, 'checkpointsBook2VecCbowWindow2')
uploadModel.SetContentFile(chptName+".zip")
uploadModel.Upload()
print("Checkpoint uploaded to Google Drive")
saveIteration += 1
This is the code I use to print out all the variables saved in a checkpoint after training. I restore the graph and print out all the variables saved.
with tf.Session() as sess:
saver = tf.train.import_meta_graph('./MODEL/bookVec.ckpt.meta')
saver.restore(sess, './MODEL/bookVec.ckpt' )
for v in tf.get_default_graph().get_collection("variables"):
print('From variables collection ', v)
And this is the output from the code above
From variables collection <tf.Variable 'embeddings:0' shape=(10001, 256) dtype=float32_ref>
From variables collection <tf.Variable 'softmax_weights:0' shape=(10001, 256) dtype=float32_ref>
From variables collection <tf.Variable 'softmax_biases:0' shape=(10001,) dtype=float32_ref>
As seen, epochCount has not been saved.
Upvotes: 1
Views: 2467
Reputation: 10475
The reason the variable is restored as 0 is because it is actually never updated (i.e. it is restored correctly)! You are overwriting epochCount
by the tf.add
call during the session, which only returns the operation, no actual value. That is, the variable (in the Tensorflow sense) is "orphaned" and will stay at 0 forever.
You could use tf.assign
to update the variable instead. It could look something like this:
# where you define the graph
epochCount = tf.get_variable( 'epochCount', initializer= 0)
update_epoch = tf.assign(epochCount, epochCount + 1)
...
# after you launched the session
for step in range(1, num_steps):
if step % 20000 == 0:
sess.run(update_epoch)
Upvotes: 1