Reputation: 53
I have both positive and negative training documents for a text classification problem. I am planning on calculating chi-square value for every feature in each document. Having that value, how may I proceed to classification using SVM? What would be the threshold value for the classification?
Upvotes: 0
Views: 935
Reputation: 16104
Chi-square value can be used to perform feature selection, which could be a pre-processing step. After that, you could greatly reduce your feature vocabulary (for example, select the most useful 100K terms from a 1M vocabulary). This step might have two benefit: 1. reduce your model size in the next step; 2. faster at prediction time. Cons: may or may not affect the classification performance.
To proceed with a classification, you still need to use those 100K features to train your model (for example, using SVM algorithm). After your model is learnt, you could use the model for classification.
Upvotes: 0