Arun
Arun

Reputation: 2478

Get file names and file path using PyTorch dataloader

I am using PyTorch 1.8 and Python 3.8 to read images from a folder using the following code:

print(f"PyTorch version: {torch.__version__}")
# PyTorch version: 1.8.1

# Device configuration-
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"currently available device: {device}")
# currently available device: cpu


# Define transformations for training and test sets-
transform_train = transforms.Compose(
    [
      # transforms.RandomCrop(32, padding = 4),
      # transforms.RandomHorizontalFlip(),
      transforms.ToTensor(),
      # transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
     ]
     )

transform_test = transforms.Compose(
    [
      transforms.ToTensor(),
      # transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
     ]
     )

# Define directory containing images-
data_dir = 'My_Datasets/Cat_Dog_data/'

# Define datasets-
train_data = datasets.ImageFolder(data_dir + '/train', 
                                  transform = train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', 
                                 transform = test_transforms)

print(f"number of train images = {len(train_data)} & number of validation images = {len(test_data)}")
# number of train images = 22500 & number of validation images = 2500

print(f"number of training classes = {len(train_data.classes)} & number of validation classes = {len(test_data.classes)}")
# number of training classes = 2 & number of validation classes = 2

# Define data loaders-
trainloader = torch.utils.data.DataLoader(train_data, batch_size = 32)
testloader = torch.utils.data.DataLoader(test_data, batch_size = 32)

len(trainloader), len(testloader)
# (704, 79)

# Sanity check-
len(train_data) / 32, len(test_data) / 32

You can iterate through the train data using 'train_loader' as follows:

for img, lab in train_loader:
   print(img.shape, lab.shape)
   pass

However, I am interested in getting the file name along with the file path from which the file was read. How can I achieve this?

Thanks!

Upvotes: 3

Views: 8645

Answers (2)

Joe Huamani
Joe Huamani

Reputation: 21

It would be useful if you can show us how you implemented your data loader.

If it is no possible, you can follow these 2 guides that would help you to understand how to customize the data you return in _getitem_:

reference 1: Multi-Class Classification Using PyTorch: Preparing Data (check Page 2 to see how _getitem_ is defined)

reference 2: Multi-Class Classification Using PyTorch: Training (check Page 2 to see how to use it)

What i would do is to add into this dictionary (taken from reference 1) the corresponding value of the path and the file name.

(modified from reference 1)

def __getitem__(self, idx):

  path = self.path[idx]
  fileName = self.fileName[idx]
  preds = self.x_data[idx]
  trgts = self.y_data[idx]

  sample = { 
    'predictors' : preds,
    'targets' : trgts,
    'path': path,
    'fileName': fileName
  }
  return sample

So, when you want to get its value in the model training implementation, just use the key to acced these values.

(modified from reference 2)

for (batch_idx, batch) in enumerate(train_ldr):

    X = batch['predictors']   
    Y = batch['targets']
    path = batch['path']
    fileName = batch['fileName']

    optimizer.zero_grad()
    oupt = net(X)
    # .....

Upvotes: 1

Shai
Shai

Reputation: 114876

The default ImageFolder Dataset holds the paths of all images in self.samples. All you need to do is modify __getitem__ to return the paths as well.

Upvotes: 4

Related Questions