userAlma
userAlma

Reputation: 53

How can I use Chi-square value for text classification using SVM?

I have both positive and negative training documents for a text classification problem. I am planning on calculating chi-square value for every feature in each document. Having that value, how may I proceed to classification using SVM? What would be the threshold value for the classification?

Upvotes: 0

Views: 935

Answers (1)

greeness
greeness

Reputation: 16104

Chi-square value can be used to perform feature selection, which could be a pre-processing step. After that, you could greatly reduce your feature vocabulary (for example, select the most useful 100K terms from a 1M vocabulary). This step might have two benefit: 1. reduce your model size in the next step; 2. faster at prediction time. Cons: may or may not affect the classification performance.

To proceed with a classification, you still need to use those 100K features to train your model (for example, using SVM algorithm). After your model is learnt, you could use the model for classification.

Upvotes: 0

Related Questions