How to implement KNNImputer in GPU?

Question

I’m working with a large dataset on Kaggle and want to speed up the imputation process by using GPU acceleration for KNN imputation. My current approach uses the CPU-based KNNImputer from sklearn, but it’s too slow for my needs.

I’ve heard that RAPIDS cuML offers GPU-accelerated KNN imputation. Here’s the code I tried so far

import pandas as pd
import cudf
from cuml.experimental.preprocessing import KNNImputer

# Convert Pandas DataFrame to cuDF DataFrame
df_bad_cleaned_gpu = cudf.DataFrame.from_pandas(df_bad_cleaned)

# Initialize KNN imputer with neighbors
knn_imputer_gpu = KNNImputer(n_neighbors=36)

# Fit and transform
df_bad_knn_filled_gpu = knn_imputer_gpu.fit_transform(df_bad_cleaned_gpu)

# Convert back to Pandas DataFrame (if needed)
df_bad_knn_filled = df_bad_knn_filled_gpu.to_pandas()

Is this the correct way to implement KNN imputation on the GPU using RAPIDS?

How to implement KNNImputer in GPU?

Answers (1)

Related Questions