WY Hsu
WY Hsu

Reputation: 1905

Why image (numpy array) is convert to string before encoding into tfrecord file?

Recently, I'm working on decoding image (let's say a bitmap format) into a tfrecord file

But, I'm wondering about the reason

Why do we need to convert numpy array data into a string type

before the data is been written into tfrecord file?

like

from PIL import Image
...
npimg = np.array(Image.open(img_path))
# My question:
# why do we need to convert numpy array img to stirng?
img_raw = npimg.tostring()
...
# later on, write img_raw to tf.train.Example

Here's the full code example that I found on the blog post Tfrecords Guide.

from PIL import Image
import numpy as np
import skimage.io as io
import tensorflow as tf


def _bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def _int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

tfrecords_filename = 'pascal_voc_segmentation.tfrecords'

writer = tf.python_io.TFRecordWriter(tfrecords_filename)

original_images = []
filename_pairs = [
     ('/path/to/example1.jpg',
      '/path/to/example2.jpg'),
     ...,
     ('/path/to/exampleN.jpg',
      '/path/to/exampleM.jpg'),
]

for img_path, annotation_path in filename_pairs:

    # read data into numpy array
    img = np.array(Image.open(img_path))
    annotation = np.array(Image.open(annotation_path))

    height = img.shape[0]
    width = img.shape[1]

    original_images.append((img, annotation))

    # My question:
    # why do we need to convert numpy array img to stirng?
    img_raw = img.tostring()
    annotation_raw = annotation.tostring()

    example = tf.train.Example(features=tf.train.Features(feature={
        'height': _int64_feature(height),
        'width': _int64_feature(width),
        'image_raw': _bytes_feature(img_raw),
        'mask_raw': _bytes_feature(annotation_raw)}))

    writer.write(example.SerializeToString())

writer.close()

Any hint would be grateful. Thanks in advance.

Upvotes: 1

Views: 764

Answers (1)

Vedanshu
Vedanshu

Reputation: 2296

To read data efficiently it can be helpful to serialize your data and store it in a set of files (100-200MB each) that can each be read linearly. This is especially true if the data is being streamed over a network. This can also be useful for caching any data-preprocessing.

Edit: This comes in handy when you are transfering the image to a server (tensorflow-server). There you have to send the data in serialized string because some media are made for streaming text. You never know -- some protocols may interpret your binary data as control characters (like a modem), or your binary data could be screwed up because the underlying protocol might think that you've entered a special character combination (like how FTP translates line endings).

So to get around this, people encode the binary data into characters. Base64 is one of these types of encodings.

Why 64? Because you can generally rely on the same 64 characters being present in many character sets, and you can be reasonably confident that your data's going to end up on the other side of the wire uncorrupted.

Upvotes: 2

Related Questions