momo
momo

Reputation: 5

Creating a matrix of dummies for each hour in R?

My data set spans over a year. This first had quarter hours, which I then aggregated into hours. So now I have 8760 hours, which means always 0 to 23 per day. I would like to create now for a linear regression, a dummy matrix which contains 24 rows and 24 columns, something like that:

hour 1 hour 2 ...
1 0
0 1

I tried it with different functions, but nothing works. I hope someone of you could help me. This are the codes for the current dataset:

data$time = substr(data$x, 1, 16)
data$time <- as.POSIXct(data$start_time, format = "%d.%m.%Y %H:%M", tz = "UTC")
data = subset(data, select = -c(x, y))
data$hour <- hour(data$time)
head(data)
df = data %>%
  mutate(data_aggregate = floor_date(time, unit = "hour")) %>%
  group_by(data_aggregate) %>%
  summarise(W = sum(W, na.rm = TRUE))
df1 <-  df %>% mutate(hour = as.factor(hour(data_aggregate)))

Upvotes: 0

Views: 181

Answers (1)

Harshvardhan
Harshvardhan

Reputation: 559

As @Onyambu explained, you don't need to do it in R. When performing linear regression (or other types of statistical models) in R, if you include a factor variable as a predictor, R automatically generates dummy variables for each level of the factor (except one which is used as the reference level). This is known as "dummy coding" or "one-hot encoding".

In your case, when you create a factor variable for the hour, R will automatically create 23 dummy variables (since there are 24 hours, and one is used as the reference level).

df1$hour <- as.factor(df1$hour)
model = lm(W ~ hour, data = df1)

If you still want to create dummy variables, here's how to do it. Not recommended

To create a dummy matrix using base R's model.matrix function, you can use:

df1$hour <- as.factor(df1$hour)
dummy_matrix <- model.matrix(~hour-1, data = df1) 

The ~hour-1 formula means that we want a model matrix from the hour variable without an intercept (-1) because we want all 24 columns representing each hour.

You can reattach this matrix to the original data frame using cbind().

Upvotes: 0

Related Questions