Reputation: 131
I have a train folder. It this folder there are 2000 images at different sizes . Also I have labels.csv file. When training network, loading and resizing this images is time consuming. So I have read some papers about h5py which is solution for this situation. I tried the following code :
PATH = os.path.abspath(os.path.join('Data'))
SOURCE_IMAGES = os.path.join(PATH, "Train")
print "[INFO] images paths reading"
images = glob(os.path.join(SOURCE_IMAGES, "*.jpg"))
images.sort()
print "[INFO] image labels reading"
labels = pd.read_csv('Data/labels.csv')
train_labels=[]
for i in range(len(labels["car"])):
if(labels["car"][i]==1.0):
train_labels.append(1.0)
else:
train_labels.append(0.0)
data_order = 'tf'
if data_order == 'th':
train_shape = (len(images), 3, 224, 224)
else:
train_shape = (len(images), 224, 224, 3
print "[INFO] h5py file created"
hf=h5py.File('data.hdf5', 'w')
hf.create_dataset("train_img",
shape=train_shape,
maxshape=train_shape,
compression="gzip",
compression_opts=9)
hf.create_dataset("train_labels",
shape=(len(train_labels),),
maxshape=(None,),
compression="gzip",
compression_opts=9)
hf["train_labels"][...] = train_labels
print "[INFO] read and size images"
for i,addr in enumerate(images):
s=dt.datetime.now()
img = cv2.imread(images[i])
img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
hf["train_img"][i, ...] = img[None]
e=dt.datetime.now()
print "[INFO] image",str(i),"is saved time:", e-s, "second"
hf.close()
But when I run this code. Code is running hours. At first it is very fast but later reading is very slow, especially at this line hf["train_img"][i, ...] = img[None]. Here output of this program. As you can see, time is constantly increasing. Where am I doing wrong? Thanks for advises.
Upvotes: 1
Views: 6884
Reputation: 11232
train_img
is created with compression_opts=9
. This is the highest compression level, taking the most work to compress/decompress.
If the time of compressing the image is a bottleneck and you can trade that off for some space taken, use a lower compression level, like the default (=4
). Or even disable the compression completely.
Upvotes: 1