Reputation: 311
I have been using pytorch a lot and got used to their dataloaders and transforms, in particular when it comes to data augmentation, as they're very user-friendly and easy to understand.
However, I need to run some ML models from sklearn.
Is there a way to use pytorch's dataloaders for sklearn ?
Upvotes: 2
Views: 4647
Reputation: 80
I came across the skorch library recently and this could help you.
"The goal of skorch is to make it possible to use PyTorch with sklearn. "
From the skorch docs:
class
skorch.dataset.Dataset(X, y=None, length=None)
General dataset wrapper that can be used in conjunction with PyTorch DataLoader.
I guess you could use the Dataset
class for wrapping your PyTorch DataLoader
and use sklearn models. If you would like to use other PyTorch features like PyTorch Tensors you could also do that.
Upvotes: 0
Reputation: 36584
Yes, you can. You can do this with sklearn
's partial_fit
method. Read HERE.
6.1.3. Incremental learning
Finally, for 3. we have a number of options inside scikit-learn. Although all algorithms cannot learn incrementally (i.e. without seeing all the instances at once), all estimators implementing the partial_fit API are candidates. Actually, the ability to learn incrementally from a mini-batch of instances (sometimes called “online learning”) is key to out-of-core learning as it guarantees that at any given time there will be only a small amount of instances in the main memory. Choosing a good size for the mini-batch that balances relevancy and memory footprint could involve some tuning [1].
Not all algorithms can do this, however.
Then, you can use pytorch
's dataloader
to preprocess the data and feed it in batches to partial_fit
.
Upvotes: 1