Reputation: 939
Today I make a .tfrecords
file with my images. The width of the image is 2048 and the height is 1536. All the images are almost 5.1GB, but when I use it to make .tfrecords
, it;s almost 137 GB! More importantlt, when I use it to train, I get an error like CUDA_ERROR_OUT_OF_MEMORY
.
Here is the error:
Total memory: 10.91GiB
Free memory: 10.45GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 68705845248 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 68705845248
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 61835259904 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 61835259904
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 68705845248 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 68705845248
E tensorflow/stream_executor/cuda/cuda_driver.cc:1034] failed to alloc 68705845248 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
W ./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 68705845248
.........
I use the smallest batch_size, but it's still wrong. Does anyone know why? Is there something wrong with my tfrecords
file?
The code that I make tfrecords
with is here:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
import cv2
import os
import os.path
from PIL import Image
train_file = 'train.txt'
name = 'trainxx'
output_directory = './tfrecords'
resize_height = 1536
resize_width = 2048
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def load_file(examples_list_file):
lines = np.genfromtxt(examples_list_file, delimiter=" ", dtype=[('col1', 'S120'), ('col2', 'i8')])
examples = []
labels = []
for example, label in lines:
examples.append(example)
labels.append(label)
return np.asarray(examples), np.asarray(labels), len(lines)
def extract_image(filename, resize_height, resize_width):
image = cv2.imread(filename)
image = cv2.resize(image, (resize_height, resize_width))
b, g, r = cv2.split(image)
rgb_image = cv2.merge([r, g, b])
return rgb_image
def transform2tfrecord(train_file, name, output_directory, resize_height, resize_width):
if not os.path.exists(output_directory) or os.path.isfile(output_directory):
os.makedirs(output_directory)
_examples, _labels, examples_num = load_file(train_file)
filename = output_directory + "/" + name + '.tfrecords'
writer = tf.python_io.TFRecordWriter(filename)
for i, [example, label] in enumerate(zip(_examples, _labels)):
print('No.%d' % (i))
image = extract_image(example, resize_height, resize_width)
print('shape: %d, %d, %d, label: %d' % (image.shape[0], image.shape[1], image.shape[2], label))
image_raw = image.tostring()
example = tf.train.Example(features=tf.train.Features(feature={
'image_raw': _bytes_feature(image_raw),
'height': _int64_feature(image.shape[0]),
'width': _int64_feature(image.shape[1]),
'depth': _int64_feature(image.shape[2]),
'label': _int64_feature(label)
}))
writer.write(example.SerializeToString())
writer.close()
def disp_tfrecords(tfrecord_list_file):
filename_queue = tf.train.string_input_producer([tfrecord_list_file])
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(
serialized_example,
features={
'image_raw': tf.FixedLenFeature([], tf.string),
'height': tf.FixedLenFeature([], tf.int64),
'width': tf.FixedLenFeature([], tf.int64),
'depth': tf.FixedLenFeature([], tf.int64),
'label': tf.FixedLenFeature([], tf.int64)
}
)
image = tf.decode_raw(features['image_raw'], tf.uint8)
# print(repr(image))
height = features['height']
width = features['width']
depth = features['depth']
label = tf.cast(features['label'], tf.int32)
init_op = tf.initialize_all_variables()
resultImg = []
resultLabel = []
with tf.Session() as sess:
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
for i in range(21):
image_eval = image.eval()
resultLabel.append(label.eval())
image_eval_reshape = image_eval.reshape([height.eval(), width.eval(), depth.eval()])
resultImg.append(image_eval_reshape)
pilimg = Image.fromarray(np.asarray(image_eval_reshape))
pilimg.show()
coord.request_stop()
coord.join(threads)
sess.close()
return resultImg, resultLabel
def read_tfrecord(filename_queuetemp):
filename_queue = tf.train.string_input_producer([filename_queuetemp])
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(
serialized_example,
features={
'image_raw': tf.FixedLenFeature([], tf.string),
'width': tf.FixedLenFeature([], tf.int64),
'depth': tf.FixedLenFeature([], tf.int64),
'label': tf.FixedLenFeature([], tf.int64)
}
)
image = tf.decode_raw(features['image_raw'], tf.uint8)
# image
tf.reshape(image, [256, 256, 3])
# normalize
image = tf.cast(image, tf.float32) * (1. / 255) - 0.5
# label
label = tf.cast(features['label'], tf.int32)
return image, label
def test():
transform2tfrecord(train_file, name, output_directory, resize_height, resize_width)
img, label = disp_tfrecords(output_directory + '/' + name + '.tfrecords')
img, label = read_tfrecord(output_directory + '/' + name + '.tfrecords') 数
print label
if __name__ == '__main__':
test()
Upvotes: 1
Views: 1795
Reputation: 19153
I didn't go through all your code, but I think I found the reason for the explosion in size of your dataset.
Your conversion process looks like this:
Image files are normally compressed. Either lossy or lossless, they are stored in a space-efficient way. You're throwing that efficiency away when you decode the image and save the raw bytes as (uncompressed) text.
Note: I don't know how your input pipeline is set up so I'm making some assumptions here, but I believe I'm getting them right.
The problem here is that, thanks to your decoded image in the tfrecord file, every example you have is rather big in size. When you set up an input pipeline, data is read and queued so that further stages of the pipeline can process it. My idea is that your examples queue gets so big that goes out of memory, because of the size of each single example.
There is a simple change you need to make to fix your problem: storing the raw data of the compressed file in your .tfrecord and then decode directly in Tensorflow. The process should look as follows:
Open the binary file and read out its content as a byte string:
with(my_image_filename, 'rb') as fp:
raw_image = fp.read()
Write the raw_image
byte string to the .tfrecord file
tf.image.decode_image()
or one of its more specific variants.This way, you won't store anywhere the decoded image until you actually need it, so your queues will stay a reasonable size and your tfrecord file too.
You're mixing OpenCV and Tensorflow, but this is not necessary. Tensorflow has all you need to convert your dataset to .tfrecord files first and decoded images afterwards and it's IMO much simpler to just stick to Tensorflow's API. Here's the guide on how to set the conversion and the input pipeline, which shows the "typical .tfrecord conversion pipeline" I described above, plus a few more tricks if you have other needs (like reading the filenames from the .csv file).
Upvotes: 3