hrzm
hrzm

Reputation: 131

How to save images as h5py file?

I have a train folder. It this folder there are 2000 images at different sizes . Also I have labels.csv file. When training network, loading and resizing this images is time consuming. So I have read some papers about h5py which is solution for this situation. I tried the following code :

PATH = os.path.abspath(os.path.join('Data'))
SOURCE_IMAGES = os.path.join(PATH, "Train")
print "[INFO] images paths reading"
images = glob(os.path.join(SOURCE_IMAGES, "*.jpg"))
images.sort()
print "[INFO] image labels reading"
labels = pd.read_csv('Data/labels.csv')

train_labels=[]

for i in range(len(labels["car"])):

    if(labels["car"][i]==1.0):

        train_labels.append(1.0)
    else:

        train_labels.append(0.0)

data_order = 'tf' 

if data_order == 'th':
    train_shape = (len(images), 3, 224, 224)
else:
    train_shape = (len(images), 224, 224, 3
print "[INFO] h5py file created"

hf=h5py.File('data.hdf5', 'w')

hf.create_dataset("train_img",
                  shape=train_shape,
                  maxshape=train_shape,
                  compression="gzip",
                  compression_opts=9)

hf.create_dataset("train_labels",
            shape=(len(train_labels),),
            maxshape=(None,),
            compression="gzip",
            compression_opts=9)

hf["train_labels"][...] = train_labels


print "[INFO] read and size images"
for i,addr in enumerate(images):

    s=dt.datetime.now()
    img = cv2.imread(images[i])
    img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    hf["train_img"][i, ...] = img[None]
    e=dt.datetime.now()
    print "[INFO] image",str(i),"is saved time:", e-s, "second"

hf.close()

But when I run this code. Code is running hours. At first it is very fast but later reading is very slow, especially at this line hf["train_img"][i, ...] = img[None]. Here output of this program. As you can see, time is constantly increasing. Where am I doing wrong? Thanks for advises.

enter image description here

Upvotes: 1

Views: 6884

Answers (1)

w-m
w-m

Reputation: 11232

train_img is created with compression_opts=9. This is the highest compression level, taking the most work to compress/decompress.

If the time of compressing the image is a bottleneck and you can trade that off for some space taken, use a lower compression level, like the default (=4). Or even disable the compression completely.

Upvotes: 1

Related Questions