Carbo
Carbo

Reputation: 916

How to created dummy variables with some conditions using a loop in R?

I am trying to create a bunch of dummy variables with 2 conditions each based on the name of the variable But i am not sure how to proceed

   I have the following dataset "dat"

  ID Entry   Exit   y2000 y2001 y2002 y2003 ....
  1   1999  2010     0      0     0     0
  2   2000  2001     0   ......
  3   2002  2003     0  ........
  4   1999  2002
  5   .....

at the moment all the y"i" variables are equal to 0 basically, what I want is to assign value 1 to variable y2000 if entry is lower or equal to 2000 and exit is higher or equal to 2000 similarly, for variable y2001 i want to assign value 1 if entry is lower or equal to 2001 and exit is higher or equal to 2001 and so on.

I can do it for a signle variable as follows:

      dat$y2000[dat$exit >= 2000 & dat$enter <= 2000] <- 1

but I d like to do this in a loop for each variable of the type y"i", how can I do?

thank you in advance for yout help

Upvotes: 1

Views: 174

Answers (1)

akrun
akrun

Reputation: 887501

We can do this with Map. Get the vector of 'y' column names with grep ('nm1'), extract the numeric part from the name, use Map to `replace the values in the corresponding 'y' column based on the logical condition created with 'enter/exit' columns and update the 'y' columns in the original dataset

nm1 <- grep("^y\\d{4}$", names(dat), value = TRUE)
nm2 <- as.integer(sub("y", "", nm1))

dat[nm1] <- Map(function(x, y) replace(dat[[x]], 
             dat$Exit >= y & dat$Entry <= y, 1), nm1, nm2)

Or using tidyverse

library(tidyverse)
dat %>%
   gather(key, val, matches("^y")) %>%
   mutate(colNum = readr::parse_number(key), %>%
          val =  +(Exit >= colNum & Entry <= colNum)) %>% 
   select(-colNum) %>% 
   spread(key, val)

data

dat <- structure(list(ID = c(1L, 2L, 3L, 5L), Entry = c(1999L, 2000L, 
  2002L, 1999L), Exit = c(2010L, 2001L, 2003L, 2002L), y2000 = c(0L, 
  0L, 0L, 0L), y2001 = c(0L, 0L, 0L, 0L), y2002 = c(0L, 0L, 0L, 
  0L), y2003 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
  -4L))

Upvotes: 1

Related Questions