Reputation: 108
I have a dataset of grayscale images, and I'd like to use the sdd-mobilenet checkpoints for training my object detection. What is the proper way to convert grayscale images to RGB that I can convert my dataset to tfrecord? Here is the code that I use (notice that the commented parts didn't work for me)
with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
encoded_jpg = fid.read()
# rgb_image = tf.image.grayscale_to_rgb(
# tf.image.encode_jpeg(encoded_jpg),
# name=None
# )
encoded_jpg_io = io.BytesIO(encoded_jpg)
encoded_jpg_io = tf.stack([encoded_jpg_io, encoded_jpg_io, encoded_jpg_io], axis=-1)
image = Image.open(encoded_jpg_io)
width, height = image.size
filename = group.filename.encode('utf8')
image_format = b'jpg'
xmins = []
xmaxs = []
ymins = []
ymaxs = []
classes_text = []
classes = []
for index, row in group.object.iterrows():
xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
# 'image/channels': dataset_util.int64_feature(),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
Upvotes: 1
Views: 1331
Reputation: 1
Why does 'image/channels': dataset_util.int64_feature(3) work and not
'image/channels': dataset_util.int64_feature(1) since you are passing grey scale images with 1 color channel?
Upvotes: 0
Reputation: 108
I tried different methods and finally could get an answer (not only converting to tfrecords, but also the training and the object detection itself).
If the dataset only consists of grayscale images, Tensorflow object detection only needs the number of channels to be defined as 3. Therefore, the only necessary change would be to add 'image/channels': dataset_util.int64_feature(3)
to the train features inside the code. There's absolutely no need to convert the grayscale to RGB using cv2.COLOR_GRAY2BGR or tf.image.grayscale_to_rgb.
Converting the images using these methods ends up getting errors like:
outofrangeerror FIFOQueue '_3_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0)
or
OP_REQUIRES failed at iterator_ops.cc:891 : Invalid argument: assertion failed: [Unable to decode bytes as JPEG, PNG, GIF, or BMP]
during training.
To avoid any additional efforts, make sure that you're using the jpg images. If you have other formats like bmp, convert them to jpg. Notice that changing the file extension is not conversion. You have to convert them using whatever tools you prefer.
Upvotes: 3