yuyudss
yuyudss

Reputation: 11

Sample Size Inconsistency Error with imblearn's classification_report_imbalanced

I'm encountering an error when using classification_report_imbalanced from imblearn.metrics on a classification task. The code runs smoothly until I add the classification_report_imbalanced function, which then results in the following error:

Found input variables with inconsistent numbers of samples: [200, 807]

This issue arises even though the model evaluation with scikit-learn's metrics like accuracy_score, precision_score, etc., works without any problem. Here is a snippet where I integrate the classification_report_imbalanced:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier
from imblearn.metrics import classification_report_imbalanced

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

models = [
    ('Logistic Regression', LogisticRegression()),
    ('K-Nearest Neighbors', KNeighborsClassifier()),
    ('Support Vector Machines', SVC()),
    ('Decision Tree', DecisionTreeClassifier()),
    ('Random Forest', RandomForestClassifier()),
    ('AdaBoost', AdaBoostClassifier()),
    ('Gradient Boosting', GradientBoostingClassifier()),
    ('Naive Bayes', GaussianNB()),
    ('Neural Network', MLPClassifier()),
    ('XGBoost', XGBClassifier()),
    ('LighGBM', LGBMClassifier()),
    ('CatBoost', CatBoostClassifier(silent=True))
]

def evaluate_model(model, X_test, y_test):
  y_pred = model.predict(X_test)
  accuracy = accuracy_score(y_test, y_pred)
  precision = precision_score(y_test, y_pred)
  recall = recall_score(y_test, y_pred)
  f1 = f1_score(y_test, y_pred)
  return accuracy, precision, recall, f1

results = []
for name, model in models:
    model.fit(X_train, y_train)
    accuracy, precision, recall, f1= evaluate_model(model, X_test, y_test)
    results.append([name, accuracy, precision, recall, f1, ])

results_df = pd.DataFrame(results, columns=['Model', 'Accuracy', 'Precision', 'Recall', 'F1 Score'])
display(results_df)

I've tried adjusting the X_train, X_test, y_train, and y_test arrays but to no avail. This is the dataset I am using.

It's been a while since I've worked with Python, and I'm unsure how to resolve this. Any suggestions or guidance would be greatly appreciated.

Upvotes: 1

Views: 54

Answers (0)

Related Questions