anemone
anemone

Reputation: 51

Extracting images from tfrecords files with protobuf without running a TensorFlow session

I'm using TensorFlow in Python and I have the data stored in TFRecords files containing tf.train.Example protocol buffers. I'm trying to extract the fields stored in each example (in the code example below these are height, width, image), without the need to run a TensorFlow session. And by trial and error I found the following code to work OK:

import numpy as np
import tensorflow as tf

def _im_feature_to_im(example, key):
    feature_ser = example.features.feature[key].bytes_list.SerializeToString()
    feature_ser_clean = feature_ser[4:]
    image = np.fromstring(feature_ser_clean, dtype=np.uint8).reshape((height, width))
    return image

for serialized_example in tf.python_io.tf_record_iterator(tfrec_filename):
    example = tf.train.Example()
    example.ParseFromString(serialized_example)
    # traverse the Example format to get data
    height = example.features.feature['height'].int64_list.value[0]
    width = example.features.feature['width'].int64_list.value[0]
    image = _im_feature_to_im(example, 'image')

So: int fields are extracted easily. But my question is regarding the extraction of the image: why do I have to remove 4 bytes from the start of the bytes array in order to get the original image? Is there some header there?

Upvotes: 2

Views: 1924

Answers (2)

Ciprian Tomoiagă
Ciprian Tomoiagă

Reputation: 3990

What you are doing in _im_feature_to_im() is to encode a message to string by calling .SerializeToString() and then to decode it by hand by removing the first 4 bytes (or, as you said in the comment, by removing all the bytes with the MSB set). This is just a redundant operation.

Instead, you can get your image by accessing the value property:

image_string = example.features.feature[key].bytes_list.value[0]

Note that this is an array of one element, hence the [0] at the end.

You can then construct the array from this, like you did:

image_arr = np.frombuffer(image_string, dtype=np.uint8)

Now, many times the images are put in tfrecords with their encoded representation (e.g. PNG or JPG), because it takes significantly less space than the raw bytes. What this means is that you should then decode the image. Tensorflow has the decode_image(...) function for this, but it will return a tensor, and you want to do this without a TF session.

You can use OpenCV to decode the image representation without a TF Session:

import cv2
image = cv2.imdecode(image_arr, cv2.IMREAD_UNCHANGED)
assert image is not None, "Could not decode image"

Upvotes: 0

Sherry Moore
Sherry Moore

Reputation: 64

That's the key for protocol buffer encoding.

https://developers.google.com/protocol-buffers/docs/encoding

You can print it out and follow the instructions at the above website to decode it. Most likely it's some encoding of tag = 1, type = 2, length = height * width.

Hope that helps!

Sherry

Upvotes: 2

Related Questions