Custom Cross Validating and Validation with extremly imbalanced classes

Question

I have a multi-class problem with highly imbalanced data.

Their is one large majority class with a few thousand members, some classes with 100-1000 members, and 10-30 classes with only 1 member.

Sampling isn't possible because it could lead to a wrong weight of the classes.

To evaluate my model I want to use cross validation. I tried cross_val_predict(x,y, cv=10) which lead to the error-code:

Warning: The least populated class in y has only 1 members, which is too few. The minimum number of members in any class cannot be less than n_splits=10.

I tried to build my own cross-validation, which is pretty straight forward.

I split my data via StratifiedKFold and then did the following:

clf = DecisionTreeClassifier()

for ta, te in splits
    xTrain, xTest = x.iloc[ta], x.iloc[te]
    yTrain, yTest = y.iloc[ta], y.iloc[te]
    clf.fit(xTrain, yTrain)
    prediction = clf.predict(xTest)
    cnf_matrix[ta] = confusion_matrix(yTest, prediction)
    classRepo[ta] = classification_report(y, prediction)

Because I am working in jupyter notebook I have to print every position of the cnf_matrix and classRepo by hand and go through it by myself.

Is there a more elegant solution like fusing the classRepo and cnf_matrix by hand, so that I can get the same result as cross_val_predict(x,y, cv=x) offers?

Is there a better metric to tackle my problem?

Custom Cross Validating and Validation with extremly imbalanced classes

Answers (1)

Related Questions