Md Shopon
Md Shopon

Reputation: 823

How to create a Image Dataset just like MNIST dataset?

I have 10000 BMP images of some handwritten digits. If i want to feed the datas to a neural network what do i need to do ? For MNIST dataset i just had to write

(X_train, y_train), (X_test, y_test) = mnist.load_data()

I am using Keras library in python . How can i create such dataset ?

Upvotes: 25

Views: 36451

Answers (4)

aimme
aimme

Reputation: 6773

I might be late, but I am posting my answer to help others who visit this question in search of an answer. In this answer, I will be explaining the dataset type, how to generate such datasets, and how to load those files.

What is the file format

These datasets are datasets already vectorized and in Numpy format. Check here (Keras Datasets Documentation) for the reference. These datasets are stored in .npz file format. Check here(MNIST digits classification dataset). Here is a code block copied from the documentation for reference.

tf.keras.datasets.mnist.load_data(path="mnist.npz")

Once you generate a .npz file you can use it the way you use the mnist default datasets.

How to generate a .npz file

Here is how to generate such a dataset from all the images in a folder

#generate and save file
from PIL import Image
import os
import numpy as np

path_to_files = "./images/"    
vectorized_images = []

for _, file in enumerate(os.listdir(path_to_files)):
    image = Image.open(path_to_files + file)
    image_array = np.array(image)
    vectorized_images.append(image_array)        
# save as DataX or any other name. But the same element name is to be used while loading it back. 
np.savez("./mnistlikedataset.npz",DataX=vectorized_images) 

if you want to use save more than one element you can do something like this with appropriate other changes to code.

np.savez("./mnistlikedataset.npz",DataX=vectorized_images_x,DataY=vectorized_images_Y)

How to load the data file

#load and use file
import numpy as np

path = "./mnistlikedataset.npz"
with np.load(path) as data:
    #load DataX as train_data
    train_data = data['DataX']
    print(train_data)

Similar to saving multiple elements, if you want to load multiple elements from a file you can do something like this with other appropriate changes

with np.load(path) as data:
    train_data = data['DataX']
    print(train_data)
    test_data = data['DataY']
    print(test_data)

Upvotes: 2

yucui xiao
yucui xiao

Reputation: 19

numpy can save array to file as binary numpy save

import numpy as np

def save_data():
  [images, labels] = read_data()
  outshape = len(images[0])
  npimages = np.empty((0, outshape), dtype=np.int32)
  nplabels = np.empty((0,), dtype=np.int32)

  for i in range(len(labels)):
      label = labels[i]
      npimages = np.append(npimages, [images[i]], axis=0)
      nplabels = np.append(nplabels, y)

  np.save('images', npimages)
  np.save('labels', nplabels)


def read_data():
  return [np.load('images.npy'), np.load('labels.npy')]

Upvotes: 1

azharimran
azharimran

Reputation: 31

You should write your own function to load all the images or do it like:

imagePaths = sorted(list(paths.list_images(args["testset"])))

# loop over the input images
for imagePath in imagePaths:
    # load the image, pre-process it, and store it in the data list
    image = cv2.imread(imagePath)
    image = cv2.resize(image, (IMAGE_DIMS[1], IMAGE_DIMS[0]))
    image = img_to_array(image)
    data.append(image)
    # extract the class label from the image path and update the
    # labels list


data = np.array(data, dtype="float") / 255.0

Upvotes: 3

Mikael Rousson
Mikael Rousson

Reputation: 2267

You can either write a function that loads all your images and stack them into a numpy array if all fits in RAM or use Keras ImageDataGenerator (https://keras.io/preprocessing/image/) which includes a function flow_from_directory. You can find an example here https://gist.github.com/fchollet/0830affa1f7f19fd47b06d4cf89ed44d.

Upvotes: 11

Related Questions