Reputation: 383
I am trying to load two datasets and use them both for training.
Package versions: python 3.7; pytorch 1.3.1
It is possible to create data_loaders seperately and train on them sequentially:
from torch.utils.data import DataLoader, ConcatDataset
train_loader_modelnet = DataLoader(ModelNet(args.modelnet_root, categories=args.modelnet_categories,split='train', transform=transform_modelnet, device=args.device),batch_size=args.batch_size, shuffle=True)
train_loader_mydata = DataLoader(MyDataset(args.customdata_root, categories=args.mydata_categories, split='train', device=args.device),batch_size=args.batch_size, shuffle=True)
for e in range(args.epochs):
for idx, batch in enumerate(tqdm(train_loader_modelnet)):
# training on dataset1
for idx, batch in enumerate(tqdm(train_loader_custom)):
# training on dataset2
Note: MyDataset is a custom dataset class which has def __len__(self):
def __getitem__(self, index):
implemented. As the above configuration works it seems that this is implementation is OK.
But I would ideally like to combine them into a single dataloader object. I attempted this as per the pytorch documentation:
train_modelnet = ModelNet(args.modelnet_root, categories=args.modelnet_categories,
split='train', transform=transform_modelnet, device=args.device)
train_mydata = CloudDataset(args.customdata_root, categories=args.mydata_categories,
split='train', device=args.device)
train_loader = torch.utils.data.ConcatDataset(train_modelnet, train_customdata)
for e in range(args.epochs):
for idx, batch in enumerate(tqdm(train_loader)):
# training on combined
However, on random batches I get the following 'expected a tensor as element X in argument 0, but got a tuple instead' type of error. Any help would be much appreciated!
> 40%|████ | 53/131 [01:03<02:00, 1.55s/it]
> Traceback (mostrecent call last): File
> "/home/chris/Programs/pycharm-anaconda-2019.3.4/plugins/python/helpers/pydev/pydevd.py",
> line 1434, in _exec
> pydev_imports.execfile(file, globals, locals) # execute the script File
> "/home/chris/Programs/pycharm-anaconda-2019.3.4/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
> exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/chris/Documents/4yp/Data/my_kaolin/Classification/pointcloud_classification_combinedset.py",
> line 83, in <module>
> for idx, batch in enumerate(tqdm(train_loader)): File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/tqdm/std.py",
> line 1107, in __iter__
> for obj in iterable: File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/dataloader.py",
> line 346, in __next__
> data = self._dataset_fetcher.fetch(index) # may raise StopIteration File
> "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py",
> line 47, in fetch
> return self.collate_fn(data) File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py",
> line 79, in default_collate
> return [default_collate(samples) for samples in transposed] File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py",
> line 79, in <listcomp>
> return [default_collate(samples) for samples in transposed] File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py",
> line 55, in default_collate
> return torch.stack(batch, 0, out=out) TypeError: expected Tensor as element 3 in argument 0, but got tuple
Upvotes: 17
Views: 44958
Reputation: 1665
If I got your question right, you have train and dev sets (and their corresponding loaders) as follows:
train_set = CustomDataset(...)
train_loader = DataLoader(dataset=train_set, ...)
dev_set = CustomDataset(...)
dev_loader = DataLoader(dataset=dev_set, ...)
And you want to concatenate them in order to use train+dev as the training data, right? If so, you just simply call:
train_dev_sets = torch.utils.data.ConcatDataset([train_set, dev_set])
train_dev_loader = DataLoader(dataset=train_dev_sets, ...)
The train_dev_loader
is the loader containing data from both sets.
Now, be sure your data has the same shapes and the same types, that is, the same number of features, or the same categories/numbers, etc.
Upvotes: 33
Reputation: 3727
Adding to @Leopd's answer, you can use the collate_fn
function provided by PyTorch. The idea is that in the collate_fn
, you will define how the examples should be stacked to make a batch. Since you are on torch 1.3.1, make sure you are looking at the correct version of the documentation.
Let me know if this helps or if you have any followup questions :)
Upvotes: 0
Reputation: 42757
I'd guess the two datasets are sometimes returning different types. When the data are Tensors, torch stacks them, and they better be the same shape. If they're something like strings, torch will make a tuple out of them. So this sounds like one of your datasets is sometimes returning something that's not a tensor. I'd put some asserts on the output of your dataset to check that it's doing what you want, or dive in with pdb
.
Upvotes: 1