Reputation: 31
I'm trying to get TFDV working with RGB images as feature inputs, reading from a TFRecords file. I can read/write the image data to TFRecord files fine. Here's the relevant code snippets for writing, where img is a numpy [32,32,3] array:
feature = {'train/label': _int64_feature(y_train[i]),
'train/image': _bytes_feature(tf.compat.as_bytes(img.tostring()))
}
And reading back:
read_features = {'train/label': tf.FixedLenFeature([], tf.int64),
'train/image': tf.FixedLenFeature([], tf.string)}
I can then use frombuffer and reshape to get back my image correcty.
The issue is that when I run tfdv.generate_statistics_from_tfrecord() using that TFRecords file. It throws an error :
ValueError: '\xff ...... \x87' has type str, but isn't valid UTF-8 encoding. Non-UTF-8 strings must be converted to unicode objects before being added. [while running 'GenerateStatistics/RunStatsGenerators/TopKStatsGenerator/TopK_ConvertToSingleFeatureStats']
I've tried all kinds of different ways of writing the images using astype(unicode) and more, but I can;t get this working.
Any ideas please?
Thanks, Paul
Upvotes: 1
Views: 705
Reputation: 2231
try the following:
image_string = open(image_location, 'rb').read()
feature = {'train/label': _int64_feature(y_train[i]),
'train/image': _bytes_feature(image_string)
}
referred from official tutorial
Upvotes: 0