Beginner
Beginner

Reputation: 749

Importing MNIST dataset from local directory in a closed system

I am trying to run a tutorial based on MNIST data in a cluster and the node where training script runs don't have internet access so I am manually placing the MNIST dataset in the desired directory but I am getting Dataset not found error. I am trying to run this tutorial on the cluster. I have tried this answer but the answer doesn't resolve my problem. Below is my code modifications -

import horovod.torch as hvd
train_dataset = \
datasets.MNIST('/scratch/netra/MNIST/processed/training.pt-%d' % hvd.rank(), train=True, download=True,
               transform=transforms.Compose([
                   transforms.ToTensor(),
                   transforms.Normalize((0.1307,), (0.3081,))
               ]))
test_dataset = \
datasets.MNIST('/scratch/netra/MNIST/processed/test.pt-%d' % hvd.rank(), train=False, 
               transform=transforms.Compose([
                   transforms.ToTensor(),
                   transforms.Normalize((0.1307,), (0.3081,))
               ]))

How to resolve it?

Upvotes: 0

Views: 843

Answers (2)

John Stud
John Stud

Reputation: 1779

If the above does not work, try putting those .pt files in a folder called .data in your current working directory:

import os
CURR_DIR = os.getcwd()
print(CURR_DIR)

train = datasets.MNIST(root='./data',download=False, train=True,
               transform=transforms.Compose([transforms.ToTensor(),
                                             transforms.Normalize((0.1307,), (0.3081,))]))
# works

train = datasets.MNIST(root=CURR_DIR + '\\data',
                       download=False, train=True,
               transform=transforms.Compose([
                   transforms.ToTensor(),
                   transforms.Normalize((0.1307,), (0.3081,))]))
# works

# same files also in this folder
train = datasets.MNIST(root=CURR_DIR + '\\processed',download=False, train=True,
               transform=transforms.Compose([
                   transforms.ToTensor(),
                   transforms.Normalize((0.1307,), (0.3081,))
               ]))
# Dataset not found

Interestingly, in the last example, this is precisely the location that the torch MNIST data set class that generates the data places the .pt files.

Upvotes: 1

trsvchn
trsvchn

Reputation: 8981

You have to specify a root folder, not a full path to the processed file:

root (string): Root directory of dataset where MNIST/processed/training.pt and MNIST/processed/test.pt exist.

In your case:

root is /scratch/netra

Thus,

train_dataset = \
datasets.MNIST('/scratch/netra-%d' % hvd.rank(), train=True, download=True,
               transform=transforms.Compose([
                   transforms.ToTensor(),
                   transforms.Normalize((0.1307,), (0.3081,))
               ]))
test_dataset = \
datasets.MNIST('/scratch/netra-%d' % hvd.rank(), train=False, 
               transform=transforms.Compose([
                   transforms.ToTensor(),
                   transforms.Normalize((0.1307,), (0.3081,))
               ]))

Upvotes: 1

Related Questions