Reputation: 267
the code without num_epochs work well
but when adding num_epochs makes an error
OutOfRangeError (see above for traceback): RandomShuffleQueue '_1_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 2, current size 0)
[[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]
I've been following Tensorflow official tutorial and can't get done with num_epochs
what I want to do is to generate an error when it passes epoch_num so I don't have to track current batch_num and max_batch_num by calculating my whole training file's instance # which is really big
any idea why? I think I'm doing something wrong
""" Some people tried to use TextLineReader for the assignment 1
but seem to have problems getting it work, so here is a short
script demonstrating the use of CSV reader on the heart dataset.
Note that the heart dataset is originally in txt so I first
converted it to csv to take advantage of the already laid out columns.
You can download heart.csv in the data folder.
Author: Chip Huyen
Prepared for the class CS 20SI: "TensorFlow for Deep Learning Research"
cs20si.stanford.edu
"""
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import sys
sys.path.append('..')
import tensorflow as tf
DATA_PATH = './heart.csv'
BATCH_SIZE = 2
N_FEATURES = 9
def batch_generator(filenames):
""" filenames is the list of files you want to read from.
In this case, it contains only heart.csv
"""
filename_queue = tf.train.string_input_producer(filenames, num_epochs=3)
reader = tf.TextLineReader(skip_header_lines=1) # skip the first line in the file
_, value = reader.read(filename_queue)
# record_defaults are the default values in case some of our columns are empty
# This is also to tell tensorflow the format of our data (the type of the decode result)
# for this dataset, out of 9 feature columns,
# 8 of them are floats (some are integers, but to make our features homogenous,
# we consider them floats), and 1 is string (at position 5)
# the last column corresponds to the label is an integer
record_defaults = [[1.0] for _ in range(N_FEATURES)]
record_defaults[4] = ['']
record_defaults.append([1])
# read in the 10 rows of data
content = tf.decode_csv(value, record_defaults=record_defaults)
# convert the 5th column (present/absent) to the binary value 0 and 1
content[4] = tf.cond(tf.equal(content[4], tf.constant('Present')), lambda: tf.constant(1.0), lambda: tf.constant(0.0))
# pack all 9 features into a tensor
features = tf.stack(content[:N_FEATURES])
# assign the last column to label
label = content[-1]
# minimum number elements in the queue after a dequeue, used to ensure
# that the samples are sufficiently mixed
# I think 10 times the BATCH_SIZE is sufficient
min_after_dequeue = 10 * BATCH_SIZE
# the maximum number of elements in the queue
capacity = 20 * BATCH_SIZE
# shuffle the data to generate BATCH_SIZE sample pairs
data_batch, label_batch = tf.train.shuffle_batch([features, label], batch_size=BATCH_SIZE,
capacity=capacity, min_after_dequeue=min_after_dequeue)
return data_batch, label_batch
def generate_batches(data_batch, label_batch):
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for _ in range(400): # generate 400 batches
features, labels = sess.run([data_batch, label_batch])
print(features)
coord.request_stop()
coord.join(threads)
def main():
data_batch, label_batch = batch_generator([DATA_PATH])
generate_batches(data_batch, label_batch)
if __name__ == '__main__':
main()
Upvotes: 0
Views: 799
Reputation: 893
Earlier answer and follow-up comment already encouraged to convert tf.train.string_input_producer to tf.data API for building the pipelines in tensorflow v2.
Providing a quick fix if you don't afford the code re-write now. Putting the following piece of code instead of import tensorflow as tf
should also work.
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
Upvotes: 0
Reputation: 126154
The tf.train.string_input_producer()
uses a "local variable" in its implementation, so you need to add
sess.run(tf.local_variables_initializer())
...before starting the queue runners.
For usability reasons like this one, we now encourage TensorFlow users to use the tf.data
API for building input pipelines. Your code could be rewritten as follows:
# Start with a dataset of filenames.
dataset = tf.data.Dataset.from_tensor_slices(filenames)
# Repeat the filenames for three epochs.
dataset = dataset.repeat(3)
# Use Dataset.flat_map() and tf.data.TextLineDataset to convert the
# filenames into a dataset of lines.
dataset = dataset.flat_map(
lambda filename: tf.data.TextLineDataset(filename).skip(1))
# Wrap the per-line parsing logic in a function, and map it over the dataset.
def parse_line(value):
record_defaults = [[1.0] for _ in range(N_FEATURES)]
record_defaults[4] = ['']
record_defaults.append([1])
# read in the 10 rows of data
content = tf.decode_csv(value, record_defaults=record_defaults)
# convert the 5th column (present/absent) to the binary value 0 and 1
content[4] = tf.cond(tf.equal(content[4], tf.constant('Present')), lambda: tf.constant(1.0), lambda: tf.constant(0.0))
# pack all 9 features into a tensor
features = tf.stack(content[:N_FEATURES])
# assign the last column to label
label = content[-1]
return features, label
dataset = dataset.map(parse_line)
# Shuffle the dataset.
dataset = dataset.shuffle(20 * BATCH_SIZE)
# Combine consecutive elements into batches.
dataset = dataset.batch(BATCH_SIZE)
# Create an iterator to get elements from the dataset.
iterator = dataset.make_one_shot_iterator()
# Get tensors that represent the next element of the iterator.
data_batch, label_batch = iterator.get_next()
# Finally, create a session to iterate over the batches.
with tf.Session() as sess:
try:
while True:
features, labels = sess.run([data_batch, label_batch])
print(features)
except tf.errors.OutOfRangeError:
# Raised when there are no more batches to produce.
pass
Upvotes: 1