Reputation: 1
I´m trying to do a cross validation with OneRClassifier. I use from sklearn.model_selection import KFold, train_test_split, cross_val_score for cross validation. I have four dimensional data for starters. The fourth is the label column and first three are features. This is the print for the X also known as features when encoded with label encoder:
[[ 0 0 9]
[ 0 6 9]
[ 0 3 8]
[ 0 9 4]
[ 0 3 8]
[ 0 9 12]
[ 0 1 2]
[ 0 0 0]
[ 0 9 5]
[ 0 7 12]
[ 0 5 13]
[ 0 3 1]
[ 0 9 8]
[ 0 5 13]
[ 0 0 11]
[ 0 3 7]
[ 0 10 14]
[ 0 2 4]
[ 0 4 3]
[ 0 3 4]
[ 0 3 12]
[ 0 3 13]
[ 0 8 10]
[ 0 8 4]
[ 0 8 1]
[ 0 8 13]
[ 0 3 8]
[ 0 4 6]
[ 0 5 1]
[ 0 2 12]]
This is the print for the y
[5 2 5 0 5 0 5 5 0 2 5 1 0 5 5 5 3 5 5 5 5 1 5 5 5 5 5 5 4 5]
The shapes are for X (30, 3) and for y (30,).
accuraciesOneR = cross_val_score(oneR, X, y, cv=kf, scoring='accuracy', error_score='raise')
This produces an error 'index 5 is out of bounds for axis 0 with size 3' on the cross validation. As i can see the shapes are correct. However i might be wrong. If you can advice me please.
The exception:
Unhandled exception. Python.Runtime.PythonException: index 5 is out of bounds for axis 0 with size 4
File "C:\Python311\Lib\site-packages\mlxtend\classifier\oner.py", line 116, in fit
inverse_index[most_frequent_class] = False
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\sklearn\model_selection\_validation.py", line 866, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Python311\Lib\site-packages\sklearn\utils\parallel.py", line 139, in __call__
return self.function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\joblib\parallel.py", line 1847, in _get_sequential_output
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\joblib\parallel.py", line 1918, in __call__
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\sklearn\utils\parallel.py", line 77, in __call__
return super().__call__(iterable_with_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\sklearn\model_selection\_validation.py", line 411, in cross_validate
results = parallel(
^^^^^^^^^
File "C:\Python311\Lib\site-packages\sklearn\utils\_param_validation.py", line 216, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\sklearn\model_selection\_validation.py", line 684, in cross_val_score
cv_results = cross_validate(
^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\sklearn\utils\_param_validation.py", line 216, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 74, in train_and_predict
at Python.Runtime.PythonException.ThrowLastAsClrException()
at Python.Runtime.PyObject.Invoke(PyTuple args, PyDict kw)
at Python.Runtime.PyObject.TryInvoke(InvokeBinder binder, Object[] args, Object& result)
at CallSite.Target(Closure , CallSite , Object , PyObject )
The code i used:
from sklearn.preprocessing import LabelEncoder
from mlxtend.classifier import OneRClassifier
from sklearn.model_selection import KFold, train_test_split, cross_val_score
from sklearn.metrics import accuracy_score
import pandas as pd
from sklearn.base import BaseEstimator, ClassifierMixin
import numpy as np
class ZeroR(BaseEstimator, ClassifierMixin):
def fit(self, X, y):
self.most_common_class_ = np.bincount(y).argmax()
return self
def predict(self, X):
return np.full(X.shape[0], self.most_common_class_)
def train_and_predict(data):
from sklearn import datasets
df = pd.DataFrame(data, columns=['A','B', 'C', 'D'])
label_encoder = LabelEncoder()
df['A'] = label_encoder.fit_transform(df['A'])
df['B'] = label_encoder.fit_transform(df['B'])
df['C'] = label_encoder.fit_transform(df['C'])
df['D'] = label_encoder.fit_transform(df['D'])
X = df.to_numpy()[:,[0,1, 2]].copy()
y = df.to_numpy()[:,3].copy()
# Define the KFold cross-validator
kf = KFold(n_splits=2, shuffle=True)
# Initialize the OneR and zeroR classifier
oneR = OneRClassifier()
zeroR = ZeroR()
print(X.shape)
print(y.shape)
# Perform cross-validation with error_score='raise'
accuraciesOneR = cross_val_score(oneR, X, y, cv=kf, scoring='accuracy', error_score='raise')
#accuraciesZeroR = cross_val_score(zeroR, X.to_numpy(copy=True), y.to_numpy(copy=True), cv=kf, scoring='accuracy', error_score='raise')
# Calculate the mean accuracy
#mean_accuracy_ZeroR = accuraciesZeroR.mean()
mean_accuracy_OneR = accuraciesOneR.mean()
print(mean_accuracy_OneR)
return 0, mean_accuracy_OneR
This is a cross validation of data with oneR in python. I expected the cross validation to work without errors and give me results.
Upvotes: 0
Views: 23