Reputation: 11
I am getting into Neural networks and image recognition in python and followed this guide.
They use:from keras.datasets import cifar10
to get the images for the testing. So my questions are:
Thanks in advance!
Upvotes: 1
Views: 5787
Reputation: 1197
The easiest way to load your dataset for training or testing is by using Keras ImageDataGenerator class (that also allows you some data augmentation methods). You have 3 options :
If your dataset is structured like this :
data/
train/
dogs/
dog001.jpg
dog002.jpg
...
cats/
cat001.jpg
cat002.jpg
...
validation/
dogs/
dog001.jpg
dog002.jpg
...
cats/
cat001.jpg
cat002.jpg
...
Then you should use .flow_from_directory(directory)
. A really good example is provided here.
If your entire dataset can be loaded into a single array (which implies that your dataset is quite small, otherwise your RAM will explode), you should go for the .flow()
function. Example here.
You can also use a pandas DataFrame
to store informations (such as paths, labels...) of each of your samples, that way the proper function would be .flow_from_dataframe(df)
. See here for a detailed example.
Lastly, if none of these functions can be applied (that can be the case for example if you have a huge dataset -implying you have to work with images paths- and quite unorthodox labels -preventing you from using .flow_from_directory()
or .flow_from_dataframe()
methods-, or simply if you just want to apply some more powerful data augmentation), then you should create a custom Data Generator. You can see here and here examples of how to create such, and there is an example of one using Imgaug for data augmentation.
There's plenty of documentation and examples online, so you should not have trouble finding what's best for you.
Upvotes: 5