D. Rusk
D. Rusk

Reputation: 91

How do I write an encoded jpeg as bytes to Tensorflow tfrecord and then read it?

I am trying to use tensorflows tfrecords format to store my datasets.

I managed to read in jpeg images and decode them to raw format and write them to a tfrecord file. I can then later read them using tf.decode_raw.

The problem is that this leads to huge file sizes because I am storing the images as raw. Now I have seen many tutorials and blogs saying I can store them in an encoded format and then when reading them just decode them. I can't find any example of this. I have been trying for a while but no matter what way I do it I'm getting formatting errors.

TLDR Does anyone know how to write images to a tfrecord file as jpegs and not as raw.

Thank You, David.

My writing function.

def convert(image_paths, labels, out_path):

num_images = len(image_paths)

with tf.python_io.TFRecordWriter(out_path) as writer:
    for i, (path, label) in enumerate(zip(image_paths, labels)):

        print_progress(count=i, total=num_images-1)
        img = open(path, 'rb').read()

        data ={'image': wrap_bytes(img),
            'label': wrap_int64(label)}

        feature = tf.train.Features(feature=data)
        example = tf.train.Example(features=feature)
        serialized = example.SerializeToString()
        writer.write(serialized)

Convert dataset with this:

{convert(image_paths=image_paths_train,
    labels=cls_train,
    out_path=path_tfrecords_train)}

My Reading Function

def parse(serialized):

features = \
    {
        'image': tf.FixedLenFeature([], tf.string),
        'label': tf.FixedLenFeature([], tf.int64)
    }
parsed_example = tf.parse_single_example(serialized=serialized,
                                         features=features)

# Get the image as raw bytes.
image_raw = parsed_example['image']

# Decode the raw bytes so it becomes a tensor with type.
image = tf.image.decode_image(image_raw,channels=3)

#image = tf.decode_raw(image_raw, tf.uint8)

# The type is now uint8 but we need it to be float.
image = tf.cast(image, tf.float32)

# Get the label associated with the image.
label = parsed_example['label']

# The image and label are now correct TensorFlow types.
return image, label

Upvotes: 4

Views: 5776

Answers (1)

GPhilo
GPhilo

Reputation: 19123

For writing, just open the file as a binary file (fp = open('something.jpg', 'rb')) and .read() its content. Store that content in the tfrecord Example as you store the image now (i.e., as a bytes feature).

For reading, instead of doing decode_raw, use tf.image.decode_image and pass in the tensor you get from the sample reader.

If you post your code, I can provide better code examples, but not knowing how your code looks like this is as detailed as I can get.

Upvotes: 4

Related Questions