john doe
john doe

Reputation: 2253

Understanding UndefinedMetricWarning in classification report with scikit-learn?

I have a text classification task with 5 categories the problem is that I am getting bad precision and this warning, probably as a result from unbalaced data(Im not sure):

/usr/local/lib/python2.7/site-packages/sklearn/metrics/metrics.py:1771: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.

I guess this warning was produced since the data is clustered in the 5 label. How can I fix this warning and how can I increase the results of the classification report? I also tried a Grid search with the following hyper-parameters:

Best parameters set:
    clf__C: 0.1
    vect__max_df: 0.25
    vect__ngram_range: (1, 1)
    vect__use_idf: True

Accuracy:
0.456923076923

But still getting bad results, could anybody help me to increase this results with SVC or another model?

Upvotes: 2

Views: 3831

Answers (1)

Andreas Mueller
Andreas Mueller

Reputation: 28748

You can use a pipeline and then also grid-search the parameters of the TfidifVectorizer together with the C of the SVC, like n-gram range (1, 1), (1, 2) or (2, 2), maybe set a different max_df, compare against CountVectorizer, maybe try character n-grams (with a higher n-gram range), too.

Upvotes: 1

Related Questions