Reputation: 1832
I just noticed that the over-/undersampler methods from the imbalanced-learn (imblearn
) package now give a future deprecation warning for running in parallel / n_jobs=x
argument
FutureWarning: The parameter
n_jobs
has been deprecated in 0.10 and will be removed in 0.12. You can pass an nearest neighbors estimator wheren_jobs
is already set instead
So we would instead of passing an int to n_jobs provide an instance of sklearn.neighbors.KNeighborsClassifier
with n_jobs instead?
Like in this Screenshot? In this example ~ 10% speed-up.
Is there anything else to consider here?
MRE Code from the Notebook (screenshotted above)
from imblearn.over_sampling import SMOTE
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV
# Create sample data
# - Settings
N = 500000
n_features = 200
X, y = make_classification(n_samples=N,
n_features=n_features,
n_clusters_per_class=1,
weights=[0.99],
flip_y=0,
random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)
%%timeit -n 5
sampler = SMOTE()
global X_train, y_train
_, _ = sampler.fit_resample(X_train,y_train)
knn = KNeighborsClassifier(n_jobs=3)
%%timeit -n 5
global X_train, y_train, knn
sampler = SMOTE(k_neighbors=knn)
_, _ = sampler.fit_resample(X_train,y_train)
Upvotes: 1
Views: 329