ffriend
ffriend

Reputation: 28552

Combining labeled and unlabeled data in a single pipeline

I'm building image classifier that uses DBN for feature learning and logistic regression to fine-tune resulting network. Normally, the most convenient way to implement such an architecture in SciKit Learn is to use Pipeline class. But in my case I have ~10K unlabeled images and only ~300 labeled ones. Surely, I want to use all images to train DBN and fit logistic regression with only labeled examples.

I can think of implementing my own Pipeline class that will handle this case, but first I'd like to know if there's already something existing. Is it?

Upvotes: 0

Views: 136

Answers (1)

ogrisel
ogrisel

Reputation: 40169

The current scikit-learn Pipeline API is not well suited for supervised learning with unsupervised pre-training. Implementing your own wrapper class is probably the best way to go forward for that case.

Upvotes: 2

Related Questions