How can I use sklearn's GridSearchCV with data that doesn't fit in memory?

Question

I have a dataset that is much too large to fit in memory, so I must train models in batches. I have wrapped my model in a GridSearchCV, a RandomizedSearchCV, or a BayesSearchCV (from scikit-optimize) and see I can not train multiple instances of these on different parts of my enormous dataset and expect the best hyperparameters found by each will agree.

I have considered wrapping my estimators in a BatchVoter (of my own design) that manages reading from the database in batches and keeps a list of models. Passing this to the XSeachCV and updating the parameter space dictionary so all keys lead with 'estimator__' might direct the the search to set the parameters of the sub-object, but there is still a problem: A search is begun with a call to the .fit() method, which must take data.

Is there a clever way to use the native GridSearchCV with data that is too big to pass to the .fit() method?

How can I use sklearn's GridSearchCV with data that doesn't fit in memory?

Answers (1)

Related Questions

How can I use sklearn&#39;s GridSearchCV with data that doesn&#39;t fit in memory?

Answers (1)

Related Questions

How can I use sklearn's GridSearchCV with data that doesn't fit in memory?