Maxime Heuillet
Maxime Heuillet

Reputation: 25

This estimator does not support Dask dataframes

I am trying to fit a model using Dask framework, and the estimator used in example says it does not accept Dask dataframe. Can someone help me please ?

    from dask_ml.model_selection import IncrementalSearchCV
    from sklearn.linear_model import SGDClassifier

    ddx,ddy = dd.from_pandas(X,chunksize=100000), 
    dd.from_pandas(y,chunksize=100000)
    X_train, X_test, y_train, y_test = train_test_split(ddx, ddy)
    model = SGDClassifier(loss='log')
    params = { 'alpha': np.logspace(-2, 1, num=1000) }
    search = IncrementalSearchCV(model, params,
                         n_initial_parameters=10, random_state=0)
    search.fit(X_train, y_train, classes=classes)
    y_pred = search.predict_proba(X_test) 

The error log is : TypeError: This estimator does not support dask dataframes.

It appears on the search fit line. When I replace by partial_fit it works but then the same error happens on the predict_proba line.

Upvotes: 1

Views: 350

Answers (1)

TomAugspurger
TomAugspurger

Reputation: 28926

IncrementalSearchCV currently requires Dask Arrays, perhaps you can convert your data.

I opened https://github.com/dask/dask-ml/issues/628 to suport dataframes. Would welcome help if you're interested in working on it.

Upvotes: 2

Related Questions