Reputation: 505
I have a dataset of about 22,000 images (round about 900 Mb for the whole thing) and I wanted to import it into Python to train a CNN.
I use the following code to import it and save it all in an array called X :
import scipy.misc as sm
for i in range (start, end):
imageLink = "./dataSet/" + str(dataSet[i, 0]) + "/" + str(dataSet[i, 1])
image = sm.imread(imageLink)
X = np.append(X, image, axis = 0)
There are a few issues with this,
It's incredibly slow. About 30 minutes imports only about 1000 images into python and it gets slower as the number of images grow.
It takes up a lot of RAM. Importing about 2000 images takes about 16 GB of RAM (My machine has only 16GB, so I end up using swap memory, which makes it even slower I suppose).
The images are all sized 640 × 480.
Am I doing something wrong or is this normal? Is there any better/faster method to import images?
Thank you.
Upvotes: 1
Views: 674
Reputation: 7552
Here are some general recommendations for this type of task:
imread
to Numpy data structures with all required normalization steps. Store Numpy objects to disk so that your main training process only needs to read them using numpy.fromfile()
.Upvotes: 2