Reputation: 2723
I would like to use ImageFolder to create an Image Dataset.
My current image directory structure looks like this:
/root
-- train/
---- 001.jpg
---- 002.jpg
---- ....
-- test/
---- 001.jpg
---- 002.jpg
---- ....
I would like to have a dataset dedicated to training data, and a dataset dedicated to test data.
As I understand, doing so:
dataset = ImageFolder(root='root/train')
does not find any images.
Doing
dataset = ImageFolder(root='root')
find images but train
and test
images are just scrambled together.
ImageFolder
has argument loader
but I did not manage to find any use-case for it.
How can I discriminate images in the root folder according to the subfolder they belong to?
Upvotes: 1
Views: 8643
Reputation: 9806
ImageFolder
expects the data folder (the one that you pass as root
) to contain subfolders representing the classes to which its images belong. Something like this:
data/
├── train/
| ├── class_0/
| | ├── 001.jpg
| | ├── 002.jpg
| | └── 003.jpg
| └── class_1/
| ├── 004.jpg
| └── 005.jpg
└── test/
├── class_0/
| ├── 006.jpg
| └── 007.jpg
└── class_1/
├── 008.jpg
└── 009.jpg
Having the above folder structure you can do the following:
train_dataset = ImageFolder(root='data/train')
test_dataset = ImageFolder(root='data/test')
Since you don't have that structure, one obvious option is to create class-subfolders and put the images into them. Another option is to create a custom Dataset, see here.
Upvotes: 3
Reputation: 727
I found the approach to create subfolders of each class, separately for train/val/test, as expected by the ImageFolder to work very well. Here's a script that I created for my own usecase, you can modify it for your own
data_dir = '/content/data/oxford-102-flowers/'
files = ['train.txt','test.txt','valid.txt']
for i in files:
with open(data_dir + i) as myfile:
for line in myfile:
curr = i.split('.')[0]
l = line.split()
src = os.path.join(data_dir + l[0])
dir = os.path.join(data_dir + curr)
if not os.path.isdir(dir):
os.mkdir(dir)
sub_dir = os.path.join(dir + '/' + l[1])
if not os.path.isdir(sub_dir):
os.mkdir(sub_dir)
os.system('cp "%s" "%s"' % (src, sub_dir))
print("All files copied to the subfolders")
I was working on the Oxford-102-Dataset and I had three .txt files for each of the train, validation and test set. The txt files contained the location and the name of the image (for eg: jpg/image_05038.jpg 58, where 58 represents the ground truth value of the actual class and 'jpg' was the source folder where all the images were stored)
Upvotes: 0