ylongqi
ylongqi

Reputation: 5

Unclear behavior for sampler in Tensorflow

For the samplers implemented in tensorflow, e.g. tf.nn.fixed_unigram_candidate_sampler. The behavior is not well-defined in the document. For instance, I would expect the labels specified in true_classes will be excluded from the sampling pool, and the sampling will be conducted for each batch. But according to my experiments, neither of above is true.

Consider the following code:

import tensorflow as tf

labels_matrix = tf.reshape(tf.constant([1, 2, 3, 4], dtype=tf.int64), [-1, 1])

sampled_ids, _, _ = tf.nn.fixed_unigram_candidate_sampler(
true_classes = labels_matrix,
num_true = 1,
num_sampled = 1,
unique = True,
range_max = 5,
distortion = 0.0,
unigrams = range(5)
)

init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
print sess.run([sampled_ids])

The output can be 3, which actually belongs to the set of true classes. - Also, the output has the dimension [1], which basically means that the sampling is only conducted once, not for each batch.

Can someone help to clarify this?

Upvotes: 0

Views: 729

Answers (1)

Alexandre Passos
Alexandre Passos

Reputation: 5206

The documentation for fixed_unigram_candidate_sampler does mention that true labels can be sampled. One of the things you've marked as _ in your code is in fact the expected ratio of sampled true labels.

Upvotes: 1

Related Questions