user14738548
user14738548

Reputation: 327

text mining preprocessing must be applied to test or to train set?

I'm doing some text-mining tasks and I have such a simple question and I still can't reach a conclusion.

I am applying pre-processing, such as tokenization and stemming to my training set so i can train my model.

Should I also apply this pre-processing to my test set?

Upvotes: 0

Views: 244

Answers (2)

xiao
xiao

Reputation: 81

Of course you should. If not, how do you input your test data into your trained model?

Upvotes: 0

berkayln
berkayln

Reputation: 995

Yes, you should apply same things to your test set. Because you test set must represent your train set, that's why they should be from same distribution. Let's think intuitively:

You will enter an exam. In order you to prepare for exam and get a normal result, lecturer should ask from same subjects in the lectures. But if the lecturer ask questions from a totally different subjects that no one has seen, it is not possible to get a normal result.

Upvotes: 1

Related Questions