0Knowledge
0Knowledge

Reputation: 755

How To Import The MNIST Dataset From Local Directory Using PyTorch

I am writing a code of a well-known problem MNIST database of handwritten digits in PyTorch. I downloaded the train and testing dataset (from the main website) including the labeled dataset. The dataset format is t10k-images-idx3-ubyte.gz and after extract t10k-images-idx3-ubyte. My dataset folder looks like

MINST
 Data
  train-images-idx3-ubyte.gz
  train-labels-idx1-ubyte.gz
  t10k-images-idx3-ubyte.gz
  t10k-labels-idx1-ubyte.gz

Now, I wrote a code to load data like bellow

def load_dataset():
    data_path = "/home/MNIST/Data/"
    xy_trainPT = torchvision.datasets.ImageFolder(
        root=data_path, transform=torchvision.transforms.ToTensor()
    )
    train_loader = torch.utils.data.DataLoader(
        xy_trainPT, batch_size=64, num_workers=0, shuffle=True
    )
    return train_loader

My code is showing Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp

How can I solve this problem and I also want to check that my images are loaded (just a figure contains the first 5 images) from the dataset?

Upvotes: 2

Views: 13401

Answers (3)

Joud C
Joud C

Reputation: 437

Late to the party but I came across this post because I was facing a similar issue. I already had the MNIST folder downloaded (via pytorch dataset) somewhere else in my repository and I didn't want to redownload it again when I needed in a different source file.

My problem was that when passing the root argument, I was referencing the MNIST/ folder but you should actually be referencing the parent folder that contains the MNIST/ directory. In fact the docs mention:

root (string) – Root directory of dataset where MNIST/raw/train-images-idx3-ubyte and MNIST/raw/t10k-images-idx3-ubyte exist.

So I figured the MNIST/ part of the path should be omitted.

So in my case I had: mnist_dataset = torchvision.datasets.MNIST(root='../MNIST/', train=True, download=False)

Hope this helps

which should be changed to: mnist_dataset = torchvision.datasets.MNIST(root='../', train=True, download=False)

Upvotes: 0

mostafiz67
mostafiz67

Reputation: 169

Read this Extract images from .idx3-ubyte file or GZIP via Python

Update

You can import data using this format

xy_trainPT = torchvision.datasets.MNIST(
    root="~/Handwritten_Deep_L/",
    train=True,
    download=True,
    transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor()]),
)

Now, what is happening at download=True first your code will check at the root directory (your given path) contains any datasets or not.

If no then datasets will be downloaded from the web.

If yes this path already contains a dataset then your code will work using the existing dataset and will not download from the internet.

You can check, first give a path without any dataset (data will be downloaded from the internet), and then give another path which already contains dataset data will not be downloaded.

Upvotes: 2

trialNerror
trialNerror

Reputation: 3553

Welcome to stackoverflow !

The MNIST dataset is not stored as images, but in a binary format (as indicated by the ubyte extension). Therefore, ImageFolderis not the type dataset you want. Instead, you will need to use the MNIST dataset class. It could even download the data if you had not done it already :)

This is a dataset class, so just instantiate with the proper root path, then put it as the parameter of your dataloader and everything should work just fine.

If you want to check the images, just use the getmethod of the dataloader, and save the result as a png file (you may need to convert the tensor to a numpy array first).

Upvotes: 0

Related Questions