Reputation: 755
I am writing a code of a well-known problem MNIST database of handwritten digits
in PyTorch. I downloaded the train and testing dataset (from the main website) including the labeled dataset. The dataset format is t10k-images-idx3-ubyte.gz
and after extract t10k-images-idx3-ubyte
. My dataset folder looks like
MINST
Data
train-images-idx3-ubyte.gz
train-labels-idx1-ubyte.gz
t10k-images-idx3-ubyte.gz
t10k-labels-idx1-ubyte.gz
Now, I wrote a code to load data like bellow
def load_dataset():
data_path = "/home/MNIST/Data/"
xy_trainPT = torchvision.datasets.ImageFolder(
root=data_path, transform=torchvision.transforms.ToTensor()
)
train_loader = torch.utils.data.DataLoader(
xy_trainPT, batch_size=64, num_workers=0, shuffle=True
)
return train_loader
My code is showing Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp
How can I solve this problem and I also want to check that my images are loaded (just a figure contains the first 5 images) from the dataset?
Upvotes: 2
Views: 13401
Reputation: 437
Late to the party but I came across this post because I was facing a similar issue. I already had the MNIST folder downloaded (via pytorch dataset) somewhere else in my repository and I didn't want to redownload it again when I needed in a different source file.
My problem was that when passing the root
argument, I was referencing the MNIST/
folder but you should actually be referencing the parent folder that contains the MNIST/ directory. In fact the docs mention:
root (string) – Root directory of dataset where MNIST/raw/train-images-idx3-ubyte and MNIST/raw/t10k-images-idx3-ubyte exist.
So I figured the MNIST/
part of the path should be omitted.
So in my case I had:
mnist_dataset = torchvision.datasets.MNIST(root='../MNIST/', train=True, download=False)
Hope this helps
which should be changed to:
mnist_dataset = torchvision.datasets.MNIST(root='../', train=True, download=False)
Upvotes: 0
Reputation: 169
Read this Extract images from .idx3-ubyte file or GZIP via Python
Update
You can import data using this format
xy_trainPT = torchvision.datasets.MNIST(
root="~/Handwritten_Deep_L/",
train=True,
download=True,
transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor()]),
)
Now, what is happening at download=True
first your code will check at the root directory (your given path) contains any datasets or not.
If no
then datasets will be downloaded from the web.
If yes
this path already contains a dataset then your code will work using the existing dataset and will not download from the internet.
You can check, first give a path without any dataset
(data will be downloaded from the internet), and then give another path which already contains dataset
data will not be downloaded.
Upvotes: 2
Reputation: 3553
Welcome to stackoverflow !
The MNIST dataset is not stored as images, but in a binary format (as indicated by the ubyte extension). Therefore, ImageFolder
is not the type dataset you want. Instead, you will need to use the MNIST dataset class. It could even download the data if you had not done it already :)
This is a dataset class, so just instantiate with the proper root
path, then put it as the parameter of your dataloader and everything should work just fine.
If you want to check the images, just use the get
method of the dataloader, and save the result as a png file (you may need to convert the tensor to a numpy array first).
Upvotes: 0