Pepe
Pepe

Reputation: 311

Using Pytorch's dataloaders & transforms with sklearn

I have been using pytorch a lot and got used to their dataloaders and transforms, in particular when it comes to data augmentation, as they're very user-friendly and easy to understand.

However, I need to run some ML models from sklearn.

Is there a way to use pytorch's dataloaders for sklearn ?

Upvotes: 2

Views: 4647

Answers (2)

Noob ML Dude
Noob ML Dude

Reputation: 80

I came across the skorch library recently and this could help you.

"The goal of skorch is to make it possible to use PyTorch with sklearn. "

From the skorch docs:

class skorch.dataset.Dataset(X, y=None, length=None) General dataset wrapper that can be used in conjunction with PyTorch DataLoader.

I guess you could use the Dataset class for wrapping your PyTorch DataLoader and use sklearn models. If you would like to use other PyTorch features like PyTorch Tensors you could also do that.

Upvotes: 0

Nicolas Gervais
Nicolas Gervais

Reputation: 36584

Yes, you can. You can do this with sklearn's partial_fit method. Read HERE.

6.1.3. Incremental learning

Finally, for 3. we have a number of options inside scikit-learn. Although all algorithms cannot learn incrementally (i.e. without seeing all the instances at once), all estimators implementing the partial_fit API are candidates. Actually, the ability to learn incrementally from a mini-batch of instances (sometimes called “online learning”) is key to out-of-core learning as it guarantees that at any given time there will be only a small amount of instances in the main memory. Choosing a good size for the mini-batch that balances relevancy and memory footprint could involve some tuning [1].

Not all algorithms can do this, however.

Then, you can use pytorch's dataloader to preprocess the data and feed it in batches to partial_fit.

Upvotes: 1

Related Questions