Reputation: 2253
I have a text classification task with 5 categories the problem is that I am getting bad precision and this warning, probably as a result from unbalaced data(Im not sure):
/usr/local/lib/python2.7/site-packages/sklearn/metrics/metrics.py:1771: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.
I guess this warning was produced since the data is clustered in the 5 label. How can I fix this warning and how can I increase the results of the classification report? I also tried a Grid search with the following hyper-parameters:
Best parameters set:
clf__C: 0.1
vect__max_df: 0.25
vect__ngram_range: (1, 1)
vect__use_idf: True
Accuracy:
0.456923076923
But still getting bad results, could anybody help me to increase this results with SVC or another model?
Upvotes: 2
Views: 3831
Reputation: 28748
You can use a pipeline and then also grid-search the parameters of the TfidifVectorizer together with the C of the SVC, like n-gram range (1, 1), (1, 2) or (2, 2), maybe set a different max_df, compare against CountVectorizer, maybe try character n-grams (with a higher n-gram range), too.
Upvotes: 1