Wilrick B
Wilrick B

Reputation: 135

Feed_dict doesnt accept my data

I've been trying to be able to feed my own images in some tensorflow code to look how the code would react to my own images instead of the MNIST set. I've been able to import images(i think) into tensorflow but I have two placeholders that should get my image data and label data. I tried to use feed_dict(which still seems right to me) to be able to use my data in the rest of my code but it won't accept any data I feed it. I know I can't feed it a Tensor and apparently not a batch but the only way I can think of to make this work is to feed it a list. I saw feed_dict is able to use numpy arrays but im not sure how i should approach converting data to a numpy array.

I'm new to TensorFlow and python so please forgive any mistakes I made, I'm still learning how everything works.

with tf.name_scope('Image_Data_Input'):
  def read_labeled_image_list(image_list_file):
    print('read_labeled_image_list function opened')
    f = open(image_list_file, 'r') 
    print('image_list_file opened')
    filenames = []
    labels = []
    print('Arrays formed')
    for line in f:
      filename, label = line[:-1].split(' ')
      filenames.append(filename)
      labels.append(label)
    print('Lines deconstructed')
    return filenames, labels

def read_image(input_queue):
    label = input_queue[1]
    file_contents = tf.read_file(input_queue[0])
    decoded_image = tf.image.decode_jpeg(file_contents, channels=3)
    print('Image decoded to JPEG')
    decoded_image.set_shape([2560, 1440, 3])
    decoded_image = tf.image.resize_images(decoded_image, [128, 128])
    return decoded_image, label

image_list, label_list = read_labeled_image_list(image_list_file)
images = tf.convert_to_tensor(image_list, dtype=tf.string)
labels = tf.convert_to_tensor(label_list, dtype=tf.string)

input_queue = tf.train.slice_input_producer([images, labels], num_epochs=None, shuffle=True)

image, label = read_image(input_queue)

The indentation behaved a little weird when i pasted my code so I'm not sure everything is properly placed.

Well now I have these placeholder:

with tf.name_scope('input'):
  x = tf.placeholder(tf.float32, shape=[None, 784])
  y_ = tf.placeholder(tf.float32, shape=[None, 10])

And I've seen the code routing data to those placeholder this way:

batch_x, batch_y = tf.train.batch([image, label], batchsize)
#_, summary = sess.run([train_writer, summary_op], feed_dict={x: batch_x, y_: batch_y})

But I can't seem to make this work.

Does anyone have any idea how i could make this work?

Again sorry for any mistakes and thanks in advance.

Upvotes: 0

Views: 776

Answers (1)

Dylan F
Dylan F

Reputation: 1305

As the error says, you can't feed tensors into a placeholder. batch_x and batch_y are tensors. The new tf.Dataset API is the preferred way to input data into a model (guide here). I think Dataset.from_tensor_slices would require minimal rewriting. Short of that, build the graph so that batch_x and batch_y flow into the model you're using directly. Then you don't need to use placeholders.

I don't recommend this, but for completeness I want to mention another method. You could:

numpy_batch_x, numpy_batch_y = sess.run([batch_x, batch_y])
_, summary = sess.run([train_writer, summary_op], 
    feed_dict={x: numpy_batch_x, y_: numpy_batch_y})

PS: If train_writer is a tf.summary.FileWriter, I think you want to:

summary = sess.run([summary_op], ...)
train_writer.add_summary(summary)

EDIT: In response to confusion on the dataset API, I am going to show how to handle this with a Dataset. I am going to use TFRecords. It may not be the simplest solution, but it's one way.

import numpy as np
from scipy.misc import imread  # There are others that would work here.
from cv2 import resize  # Again, others to choose from.

def read_labeled_image_list(...)
    # See question
    return filenames, labels

def make_tfr(tfr_dir="/YOUR/PREFERRED/TFR/DIR")
    def _int64_list_feature(a_list):
        return tf.train.Feature(int64_list=tf.train.Int64List(value=a_list)

    def _bytes_feature(value):
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

    writer = tf.python_io.TFRecordWriter(tfr_dir)
    all_image_paths, all_labels = read_labeled_image_list(...)
    for path, label in zip(all_image_paths, all_labels):
        disk_im = imread(path)
        resized_im = cv2.resize(disk_im, (128, 128))
        raw_im = resized_im.tostring()
        # Construct an example proto-obj,
        example = tf.train.Example(
            # which wants a Features proto-obj,
            features=tf.train.Features(
                # which wants a dict.
                feature={
                    'image_raw': _bytes_feature(raw_im),
                    'label': _int64_list_feature(label)
        })) # close your example object
        serialized = example.SerializeToString()
        writer.write(serialized)

make_tfr()  # After you've done it successfully once, comment out.

def input_pipeline(batch_size, epochs, tfr_dir="/YOUR/PREFERRED/TFR/DIR"):
    # with tf.name_scope("Input"):  maybe you like to scope as much as I do?
    dataset = tf.data.TFRecordDataset(tfr_dir)

    def parse_protocol_buffer(example_proto):
        features = {'image_raw': tf.FixedLenFeature((), tf.string),
                    'label': tf.FixedLenFeature((), tf.int64)}
        parsed_features = tf.parse_single_example(
            example_proto, features)
        return parsed_features['image_raw'], parsed_features['label']

    dataset = dataset.map(parse_protocol_buffer)

    def convert_parsed_proto_to_input(image_string, label):
        image_decoded = tf.decode_raw(image_string, tf.uint8)
        image_resized = tf.reshape(image_decoded, (128, 128, 3))
        image = tf.cast(image_resized, tf.float32)
        # I usually put my image elements in [-1, 1]
        return image * (2. /255) -1, label

    dataset = dataset.map(converted_parsed_proto_to_input)
    dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.repeat(batch_size * epochs)
    return dataset

def model(image_tensor):
    ...
    # However you want to do this.
    return predictions

def loss(predictions, labels):
    ...
    return some_loss

def train(some_loss):
    ...
    return train_op

batch_size = 50
iterations = 10000
train_dataset = input_pipeline(batch_size, iterations)
train_iterator = train_dataset.make_initializable_iterator()
image, label = train_iterator.get_next()
predictions = model(image)
loss_op = loss(image, predictions)
train_op = train(loss_op)
summary_op = tf.summary.merge_all()
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter("/YOUR/LOGDIR", sess.graph)
    sess.run(train_iterator.initializer)
    for epoch in range(iterations + 1):
        _, summary = sess.run([train_op, summary_op])
        train_writer.add_summary(summary, epoch)

You say you're new to TensorFlow. I hope this doesn't intimidate you. I was new to TensorFlow not long ago, and it was a pain to figure out how to make a good input pipeline. Learning TFRecords seemed impossible. You also say you're new to Python, so I'll warn you that cv2 has a reputation of difficult installs. You may want to look into other ways to resize an image (though I'd advise against PIL, which is probably even more confusing and difficult at first).

Basically, I'm posting this code because because the documentation on writing TFRecords is confusing (Exhibit A vs a blog post that helped me figure it out) but TFRecords is the way I know how to make a Dataset best. Even if you don't go the TFRecords route, this could help you with the map function for datasets, e.g. notice how I pass label through convert... even though it's not used. Making a dataset (specifically from TFRecords) is a lot of lines of code, but Dataset is the preferred way to construct an input pipeline and it's designed to replace the old queue method you're using.

As a side note, the purpose of the queue strategy was to read data from memory directly into the graph without placeholders. Placeholders are slow and memory-intensive compared to the queue strategy, but Datasets are even better when implemented correctly.

I see in your comment that you want to see the placeholder namescope get connected to your graph. The dataset way, you'll see some dataset nodes on the graph. If you scope them with what I commented out, it should be apparent that everything's hooked up right. Your way, you're actually adding this queue-and-preprocess structure onto the graph. Since you'll have to de-tensor-ify the images to pass them into a placeholder, it won't be apparent that your data is flowing correctly.

Now, as I mentioned in the original post, you can just pass batch_x and batch_y into your model and forget the placeholder and dataset altogether. You'll see everything hooked up right from the preprocessing stage, if the queue is implemented right. Still, your images are large before reshaping them. Reading them will be an intensive task. I'd recommend going the hard route of learning to use Datasets and TFRecords.

I hope this helps you implement a Dataset in your code. I hope this helps you get TensorBoard running. And I hope this helps you figure out TFRecords if you decide to go that route.

PS: On the topic of TensorBoard validating that the model is working, you could attach a tf.summary.image(img) as the first line of model(...). Then check out the image dash and see if it's what you expect.

EDIT 2: example = tf.train.Example(features=tf.train.Features(feature={}))

Upvotes: 1

Related Questions