Character n-gram vs word features in NLP

Question

I'm trying to predict if reviews on yelp are positive or negative by performing linear regression using SGD.
I tried two different feature extractors.
The first was the character n-gram and the second was separating words by space.
However, I tried different n values for the character n-gram, and found that the n value that gave me the best test error.
I noticed that this test error (0.27 in my test data) was nearly identical to the test error from extracting the words separated by space.

Is there a reason behind this coincidence?
Shouldn't the character n-gram have a lower test error since it extracted more features than the word features?

Character n-gram: ex. n=7 "Good restaurant" => "Goodres" "oodrest" "odresta" "drestau" "restaur" "estaura" "stauran" "taurant"

Word features: "Good restaurant" => "Good" "restaurant"

Character n-gram vs word features in NLP

Answers (1)

Related Questions