Saurabh Patel
Saurabh Patel

Reputation: 11

How to implement KNNImputer in GPU?

I’m working with a large dataset on Kaggle and want to speed up the imputation process by using GPU acceleration for KNN imputation. My current approach uses the CPU-based KNNImputer from sklearn, but it’s too slow for my needs.

I’ve heard that RAPIDS cuML offers GPU-accelerated KNN imputation. Here’s the code I tried so far

import pandas as pd
import cudf
from cuml.experimental.preprocessing import KNNImputer

# Convert Pandas DataFrame to cuDF DataFrame
df_bad_cleaned_gpu = cudf.DataFrame.from_pandas(df_bad_cleaned)

# Initialize KNN imputer with neighbors
knn_imputer_gpu = KNNImputer(n_neighbors=36)

# Fit and transform
df_bad_knn_filled_gpu = knn_imputer_gpu.fit_transform(df_bad_cleaned_gpu)

# Convert back to Pandas DataFrame (if needed)
df_bad_knn_filled = df_bad_knn_filled_gpu.to_pandas()

Is this the correct way to implement KNN imputation on the GPU using RAPIDS?

Upvotes: 1

Views: 45

Answers (1)

TaureanDyerNV
TaureanDyerNV

Reputation: 1291

The issue for this seems to still be open at time I am making this post: https://github.com/rapidsai/cuml/issues/4694.

There is someone trying an implementation towards the end of the thread, so you can reply to them with your desire to know more. It is also not documented as a feature, even in our current nightlies, thus, your code will not work and should not be expected to work. Please do check our stable and nightly docs for the latest available algos.

Also, as we're talking best practices, if using pandas, one thing you can use is cudf.pandas instead of both cudf and pandas. Then you don't have to do the conversion steps between the two(it will be done for you)

Upvotes: 0

Related Questions