Reputation: 11
I'm encountering an error when using classification_report_imbalanced
from imblearn.metrics
on a classification task. The code runs smoothly until I add the classification_report_imbalanced
function, which then results in the following error:
Found input variables with inconsistent numbers of samples: [200, 807]
This issue arises even though the model evaluation with scikit-learn
's metrics like accuracy_score
, precision_score
, etc., works without any problem. Here is a snippet where I integrate the classification_report_imbalanced
:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier
from imblearn.metrics import classification_report_imbalanced
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
models = [
('Logistic Regression', LogisticRegression()),
('K-Nearest Neighbors', KNeighborsClassifier()),
('Support Vector Machines', SVC()),
('Decision Tree', DecisionTreeClassifier()),
('Random Forest', RandomForestClassifier()),
('AdaBoost', AdaBoostClassifier()),
('Gradient Boosting', GradientBoostingClassifier()),
('Naive Bayes', GaussianNB()),
('Neural Network', MLPClassifier()),
('XGBoost', XGBClassifier()),
('LighGBM', LGBMClassifier()),
('CatBoost', CatBoostClassifier(silent=True))
]
def evaluate_model(model, X_test, y_test):
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
return accuracy, precision, recall, f1
results = []
for name, model in models:
model.fit(X_train, y_train)
accuracy, precision, recall, f1= evaluate_model(model, X_test, y_test)
results.append([name, accuracy, precision, recall, f1, ])
results_df = pd.DataFrame(results, columns=['Model', 'Accuracy', 'Precision', 'Recall', 'F1 Score'])
display(results_df)
I've tried adjusting the X_train
, X_test
, y_train
, and y_test
arrays but to no avail.
This is the dataset I am using.
It's been a while since I've worked with Python, and I'm unsure how to resolve this. Any suggestions or guidance would be greatly appreciated.
Upvotes: 1
Views: 54