Reozilla
Reozilla

Reputation: 1

Python mlExtend cross_val error on cross validation. : 'index 5 is out of bounds for axis 0 with size 3' error

I´m trying to do a cross validation with OneRClassifier. I use from sklearn.model_selection import KFold, train_test_split, cross_val_score for cross validation. I have four dimensional data for starters. The fourth is the label column and first three are features. This is the print for the X also known as features when encoded with label encoder:

[[ 0  0  9]
 [ 0  6  9]
 [ 0  3  8]
 [ 0  9  4]
 [ 0  3  8]
 [ 0  9 12]
 [ 0  1  2]
 [ 0  0  0]
 [ 0  9  5]
 [ 0  7 12]
 [ 0  5 13]
 [ 0  3  1]
 [ 0  9  8]
 [ 0  5 13]
 [ 0  0 11]
 [ 0  3  7]
 [ 0 10 14]
 [ 0  2  4]
 [ 0  4  3]
 [ 0  3  4]
 [ 0  3 12]
 [ 0  3 13]
 [ 0  8 10]
 [ 0  8  4]
 [ 0  8  1]
 [ 0  8 13]
 [ 0  3  8]
 [ 0  4  6]
 [ 0  5  1]
 [ 0  2 12]]

This is the print for the y

[5 2 5 0 5 0 5 5 0 2 5 1 0 5 5 5 3 5 5 5 5 1 5 5 5 5 5 5 4 5]

The shapes are for X (30, 3) and for y (30,).

 accuraciesOneR = cross_val_score(oneR, X, y, cv=kf, scoring='accuracy', error_score='raise')

This produces an error 'index 5 is out of bounds for axis 0 with size 3' on the cross validation. As i can see the shapes are correct. However i might be wrong. If you can advice me please.

The exception:

Unhandled exception. Python.Runtime.PythonException: index 5 is out of bounds for axis 0 with size 4
  File "C:\Python311\Lib\site-packages\mlxtend\classifier\oner.py", line 116, in fit
    inverse_index[most_frequent_class] = False
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\sklearn\model_selection\_validation.py", line 866, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Python311\Lib\site-packages\sklearn\utils\parallel.py", line 139, in __call__
    return self.function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\joblib\parallel.py", line 1847, in _get_sequential_output
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\joblib\parallel.py", line 1918, in __call__
    return output if self.return_generator else list(output)
                                                ^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\sklearn\utils\parallel.py", line 77, in __call__
    return super().__call__(iterable_with_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\sklearn\model_selection\_validation.py", line 411, in cross_validate
    results = parallel(
              ^^^^^^^^^
  File "C:\Python311\Lib\site-packages\sklearn\utils\_param_validation.py", line 216, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\sklearn\model_selection\_validation.py", line 684, in cross_val_score
    cv_results = cross_validate(
                 ^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\sklearn\utils\_param_validation.py", line 216, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 74, in train_and_predict
   at Python.Runtime.PythonException.ThrowLastAsClrException()
   at Python.Runtime.PyObject.Invoke(PyTuple args, PyDict kw)
   at Python.Runtime.PyObject.TryInvoke(InvokeBinder binder, Object[] args, Object& result)
   at CallSite.Target(Closure , CallSite , Object , PyObject )

The code i used:

from sklearn.preprocessing import LabelEncoder
from mlxtend.classifier import OneRClassifier
from sklearn.model_selection import KFold, train_test_split, cross_val_score
from sklearn.metrics import accuracy_score
import pandas as pd
from sklearn.base import BaseEstimator, ClassifierMixin
import numpy as np

class ZeroR(BaseEstimator, ClassifierMixin):
    def fit(self, X, y):
        self.most_common_class_ = np.bincount(y).argmax()
        return self

    def predict(self, X):
        return np.full(X.shape[0], self.most_common_class_)

def train_and_predict(data):
    from sklearn import datasets
    df = pd.DataFrame(data, columns=['A','B', 'C', 'D'])
    
    label_encoder = LabelEncoder()
    df['A'] = label_encoder.fit_transform(df['A'])
    df['B'] = label_encoder.fit_transform(df['B'])
    df['C'] = label_encoder.fit_transform(df['C'])
    df['D'] = label_encoder.fit_transform(df['D'])
   
      X = df.to_numpy()[:,[0,1, 2]].copy()
    y = df.to_numpy()[:,3].copy()
  

    # Define the KFold cross-validator
    kf = KFold(n_splits=2, shuffle=True)

    # Initialize the OneR and zeroR classifier
    oneR = OneRClassifier()
    zeroR = ZeroR()

    print(X.shape)
    print(y.shape)

    # Perform cross-validation with error_score='raise'
    accuraciesOneR = cross_val_score(oneR, X, y, cv=kf, scoring='accuracy', error_score='raise')
    #accuraciesZeroR = cross_val_score(zeroR, X.to_numpy(copy=True), y.to_numpy(copy=True), cv=kf, scoring='accuracy', error_score='raise')


    # Calculate the mean accuracy
    #mean_accuracy_ZeroR = accuraciesZeroR.mean()
    mean_accuracy_OneR = accuraciesOneR.mean()
    print(mean_accuracy_OneR)
     

    return 0, mean_accuracy_OneR

This is a cross validation of data with oneR in python. I expected the cross validation to work without errors and give me results.

Upvotes: 0

Views: 23

Answers (0)

Related Questions