Rohaan Manzoor
Rohaan Manzoor

Reputation: 197

kfold cross validation wont terminate, stuck at cross_val_score

I am trying to run kfold cross validation. but for some reason, it gets stuck here, it wont terminate from here accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10, n_jobs = -1) i cant understand whats the problem. and how do i fix it.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:,1:]

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)


import keras
from keras.models import Sequential #Required to initialize the ANN
from keras.layers import Dense #Build layers of ANN
from keras.layers import Dropout


# Evaluating the ANN
import keras
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score

from keras.models import Sequential #Required to initialize the ANN
from keras.layers import Dense #Build layers of ANN

def build_classifier(): # Builds the architecture, or the classifier
    classifier = Sequential()
    classifier.add(Dense(activation = 'relu', input_dim = 11, units = 6, kernel_initializer = 'uniform'))# add layers
    classifier.add(Dense(activation = 'relu', units = 6, kernel_initializer = 'uniform'))# add layers
    classifier.add(Dense(activation = 'sigmoid', units = 1, kernel_initializer = 'uniform'))
    classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
    return classifier
classifier = KerasClassifier(build_fn = build_classifier, batch_size = 10, nb_epoch = 100)
accuracies  = cross_val_score(estimator  = classifier, X = X_train, y = y_train, cv = 10, n_jobs = -1)
mean = accuracies.mean()
variance = accuracies.std()

Edit
Im on windows 10 using Anaconda with python 3.6.
Dataset : Drive Link for dataset
It works perfectly when i set n_jobs = 1 but not when n_jobs = -1

Upvotes: 1

Views: 1200

Answers (1)

Gambit1614
Gambit1614

Reputation: 8801

Since you have set the n_jobs = -1, then all the CPUs are being utlised as per the documentation mentioned here. However, you must understand that utilising all the CPUs does not necessarily may lead to reduction in execution time because:

  • There is an overhead invovled with creation and allocation of reasources to new threads.
  • Also, there might be other bottlenecks like data being to large to be broadcasted to all threads at the same time, thread pre-emption over RAM (or other resouces,etc.), how data is pushed into each thread, etc.
  • Also multithreading in Python has various shortcomings, see here and here.

You can check out a similar issue with GridSearchCV and parallization here in this answer.

Also, as mentioned by @ncfith, there is no current solution for this problem.

References

Upvotes: 1

Related Questions