cuML functions running on DASK? and dask_cudf manipulation?

Question

How to run dask_cuML (logistic regression for example) on a large dataset, dask_cudf?

I can not run cuML on my cudf dataframe because dataset is large so "OUT of MEMORY" as soon as I try anything. Bright side is I got 4 GPUs to use with dask_cudf.

Does anybody know steps to use to run logistic regression for example on a dask_cudf dataframe?

About my cudf and cuml logistic function:

type(gdf)
cudf.core.dataframe.DataFrame

logreg = cuml.LogisticRegression(penalty='none', tol=1e-6, max_iter=10000)
logreg.fit(gdf[['A', 'B', 'C', 'D', 'E']], gdf['Z'])

My thoughts -- in steps: (Not Working!)

1- Convert gdf cudf to dask_cudf.

  ddf = dask_cudf.from_cudf(gdf, npartitions=2) -- what's the number of partitions?

2- meta_dtypes = dict(zip(ddf.columns, ddf.dtypes))

3-

def logistic_regression(gdf):
              return logreg.fit(gdf[['A', 'B', 'C', 'D', 'E']], gdf['Z'])

4- ddf = ddf .map_partitions(logistic_regression, meta=meta_dtypes)

ddf.compute().persist()

Any suggestions or insights are appreciated!

cuML functions running on DASK? and dask_cudf manipulation?

Answers (1)

Related Questions