Trgovec
Trgovec

Reputation: 575

Regression of dummy variables in R

I am new to R and I am trying to performa regression on my dataset, which includes e.g. monthly sales data of a company in different countries over multiple years.

In other statistical programs, in order to control for quarterly cyclical movement of sales as well as for the regional (country) differences, I would create dummy variables indicating e.g. quarters and countries where sales are made.

My questions:

1) I saw that in R you can set a variable type to 'Factor'. Do I in this case still need to create dummy variables indicating countries and months/quarters, or is R already treating the factor variables differently and is automatically converting them to dummies in the background?

2) If the above is not the case, and I indeed need to recode my values into 0,1 dummies, is there a neat standard way in R to do it?

Thanks a lot for your help and have a nice day!

Trgovec

Upvotes: 4

Views: 13629

Answers (2)

J.R.
J.R.

Reputation: 3878

R will automatically create the corresponding design model.matrix() from your formula, eg:

lm(mpg ~ factor(gear) + I(cyl > 4), data = mtcars)

If you like to create the dummies yourself then take a look at model.matrix()

model.matrix(~ - 1 + factor(gear), data = mtcars)

                    factor(gear)3 factor(gear)4 factor(gear)5
Mazda RX4                       0             1             0
Mazda RX4 Wag                   0             1             0
Datsun 710                      0             1             0
Hornet 4 Drive                  1             0             0
Hornet Sportabout               1             0             0
Valiant                         1             0             0

Upvotes: 4

Oriol Mirosa
Oriol Mirosa

Reputation: 2826

Yes, R automatically treats factor variables as reference dummies, so there's nothing else you need to do and, if you run your regression, you should see the typical output for dummy variables for those factors.

Notice, however, that there are several ways of coding categorical variables, so you might want to do something different using the C function. You can find good details here. Also, there are packages devoted to help you in the creation of dummy variables if you need more control, such as the dummies package.

Upvotes: 5

Related Questions