Reputation: 25
I am trying to fit a model using Dask framework, and the estimator used in example says it does not accept Dask dataframe. Can someone help me please ?
from dask_ml.model_selection import IncrementalSearchCV
from sklearn.linear_model import SGDClassifier
ddx,ddy = dd.from_pandas(X,chunksize=100000),
dd.from_pandas(y,chunksize=100000)
X_train, X_test, y_train, y_test = train_test_split(ddx, ddy)
model = SGDClassifier(loss='log')
params = { 'alpha': np.logspace(-2, 1, num=1000) }
search = IncrementalSearchCV(model, params,
n_initial_parameters=10, random_state=0)
search.fit(X_train, y_train, classes=classes)
y_pred = search.predict_proba(X_test)
The error log is : TypeError: This estimator does not support dask dataframes.
It appears on the search fit line. When I replace by partial_fit it works but then the same error happens on the predict_proba line.
Upvotes: 1
Views: 350
Reputation: 28926
IncrementalSearchCV currently requires Dask Arrays, perhaps you can convert your data.
I opened https://github.com/dask/dask-ml/issues/628 to suport dataframes. Would welcome help if you're interested in working on it.
Upvotes: 2