Ankit Katiyar
Ankit Katiyar

Reputation: 3001

Convert a 28 level categorical variable to matrix

I have a data set that has one column company, I will do regression modelling for this dataset.

Should I convert it using model.matrix or just assign values from 1-28 in one column.

What is the relevance of converting it to 28 columns when lm function can deal with it?

Upvotes: 1

Views: 463

Answers (1)

LyzandeR
LyzandeR

Reputation: 37879

Should I convert it using model.matrix or just assign values from 1-28 in one column?

You should do neither:

  • If you assign values from 1 to 28 in one column, it would be like saying that company 28 has 28 times the weight of company 1, whereas all the companies would need to have the same weight in your analysis (assuming these are company names that do not have an ordinal relationship).
  • Using model.matrix will convert your company column in dummy variables (0 - 1 flags), but you do not need to do that since lm will do that automatically for you.

What is the relevance of converting it to 28 columns when lm function can deal with it?

As I mention previously lm does that for you, so there is no need to do that on your own. However, I need to point out that you will end up with 27 columns (plus the intercept) as one (the reference column) will be left out on purpose. The reason is that by knowing the other 27 companies you implicitly know the 28th as well (i.e. the reference column is 100% correlated with the combination of the other 27, so it needs to be omitted).

Upvotes: 1

Related Questions