Reputation: 823
I have 10000 BMP images of some handwritten digits. If i want to feed the datas to a neural network what do i need to do ? For MNIST dataset i just had to write
(X_train, y_train), (X_test, y_test) = mnist.load_data()
I am using Keras library in python . How can i create such dataset ?
Upvotes: 25
Views: 36451
Reputation: 6773
I might be late, but I am posting my answer to help others who visit this question in search of an answer. In this answer, I will be explaining the dataset type, how to generate such datasets, and how to load those files.
What is the file format
These datasets are datasets already vectorized
and in Numpy format
. Check here (Keras Datasets Documentation) for the reference. These datasets are stored in .npz
file format. Check here(MNIST digits classification dataset). Here is a code block copied from the documentation for reference.
tf.keras.datasets.mnist.load_data(path="mnist.npz")
Once you generate a .npz file you can use it the way you use the mnist default datasets.
How to generate a .npz file
Here is how to generate such a dataset from all the images in a folder
#generate and save file
from PIL import Image
import os
import numpy as np
path_to_files = "./images/"
vectorized_images = []
for _, file in enumerate(os.listdir(path_to_files)):
image = Image.open(path_to_files + file)
image_array = np.array(image)
vectorized_images.append(image_array)
# save as DataX or any other name. But the same element name is to be used while loading it back.
np.savez("./mnistlikedataset.npz",DataX=vectorized_images)
if you want to use save more than one element you can do something like this with appropriate other changes to code.
np.savez("./mnistlikedataset.npz",DataX=vectorized_images_x,DataY=vectorized_images_Y)
How to load the data file
#load and use file
import numpy as np
path = "./mnistlikedataset.npz"
with np.load(path) as data:
#load DataX as train_data
train_data = data['DataX']
print(train_data)
Similar to saving multiple elements, if you want to load multiple elements from a file you can do something like this with other appropriate changes
with np.load(path) as data:
train_data = data['DataX']
print(train_data)
test_data = data['DataY']
print(test_data)
Upvotes: 2
Reputation: 19
numpy can save array to file as binary numpy save
import numpy as np
def save_data():
[images, labels] = read_data()
outshape = len(images[0])
npimages = np.empty((0, outshape), dtype=np.int32)
nplabels = np.empty((0,), dtype=np.int32)
for i in range(len(labels)):
label = labels[i]
npimages = np.append(npimages, [images[i]], axis=0)
nplabels = np.append(nplabels, y)
np.save('images', npimages)
np.save('labels', nplabels)
def read_data():
return [np.load('images.npy'), np.load('labels.npy')]
Upvotes: 1
Reputation: 31
You should write your own function to load all the images or do it like:
imagePaths = sorted(list(paths.list_images(args["testset"])))
# loop over the input images
for imagePath in imagePaths:
# load the image, pre-process it, and store it in the data list
image = cv2.imread(imagePath)
image = cv2.resize(image, (IMAGE_DIMS[1], IMAGE_DIMS[0]))
image = img_to_array(image)
data.append(image)
# extract the class label from the image path and update the
# labels list
data = np.array(data, dtype="float") / 255.0
Upvotes: 3
Reputation: 2267
You can either write a function that loads all your images and stack them into a numpy array if all fits in RAM or use Keras ImageDataGenerator (https://keras.io/preprocessing/image/) which includes a function flow_from_directory
. You can find an example here https://gist.github.com/fchollet/0830affa1f7f19fd47b06d4cf89ed44d.
Upvotes: 11