Reputation: 4575
Hi I'm new to R and I was hoping someone could give me some tips on building a regression model. I have some sample data similar to the sample data below which contains categorical variables like the path variable. I would like to convert these categorical variables to binary variables like the "Transformed Data" shown below, so I can use them to create a regression model to predict WaitTime based on Volume on each Path. In python there's a function called getdummies which does this nicely. If anyone can give me tips on a similar function in r, or a way to build a regression model with categorical variables. I would greatly appreciate it. My end goal is to build the regression model and then find the volume value for each path that will minimize WaitTIme. Any tips on that as well would definitely be appreciated.
Sample Data:
Path WaitTime Volume
AD_IB 195 3
GMC_DT 154 4
CD_ADT 192 2
Ord_IB 326 1
Transformed Data:
AD_IB GMC_DT CD_ADT Ord_IB WaitTime Volume
1 0 0 0 195 3
0 1 0 0 154 4
0 0 1 0 192 2
0 0 0 1 326 1
Upvotes: 1
Views: 361
Reputation: 20399
R
does that automatically for you:
set.seed(1)
d <- data.frame(cat = factor(LETTERS[sample(3, 100, TRUE)]), y = rnorm(100))
lm(y ~ cat, d)
# # Call:
# lm(formula = y ~ cat, data = d)
#
# Coefficients:
# (Intercept) catB catC
# -0.2385 0.3518 0.2493
Upvotes: 2