theidentity
theidentity

Reputation: 13

Multi-conditional mutate

I have a data frame that requires conditional recoding of a column based on the date listed in certain rows for each subset of IDs. I am trying to figure out how to best achieve this using the mutate function in dplyr. Suggestions and alternate solutions are welcome, but I would like to avoid using for loops.

I know how to write a really verbose and inefficient for loop that would solve this problem, but would like to know how to do it more efficiently.

The sample data frame:

df<-data.frame(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
               date = as.Date(c("2016-02-01","2016-02-01","2016-02-01","2016-03-21", "2016-03-21", "2016-03-21", "2016-10-05", "2016-10-05", "2016-10-05", "2016-10-05", "2016-03-01","2016-03-01","2016-03-01","2016-04-21", "2016-04-21", "2016-04-21", "2016-11-05", "2016-11-05", "2016-11-05", "2016-11-05")),
               trial = c(NA, NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, NA, NA, NA)

My pseudo code - the second logical argument in the first two case_when statements is where I am stuck.

df%>%
  group_by(ID)%>%
  mutate(results = case_when(
     is.na(trial) & date < date where trial = 1 ~ 0,
     is.na(trial) & date > date where trial = 1 ~ 2,
     trial == trial
  ))

The expected result being:

data.frame(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
               date = as.Date(c("2016-02-01","2016-02-01","2016-02-01","2016-03-21", "2016-03-21", "2016-03-21", "2016-10-05", "2016-10-05", "2016-10-05", "2016-10-05", "2016-03-01","2016-03-01","2016-03-01","2016-04-21", "2016-04-21", "2016-04-21", "2016-11-05", "2016-11-05", "2016-11-05", "2016-11-05")),
               trial = c(0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2)
)

Upvotes: 1

Views: 156

Answers (2)

akrun
akrun

Reputation: 887721

An option would be to group by 'ID' and transform the 'trial' by applying the run-length-id on (rleid) on the 'trial' column

library(dplyr)
library(data.table)
df %>%
   group_by(ID) %>% 
   mutate(trial = rleid(trial)-1)
# A tibble: 20 x 3
# Groups:   ID [2]
#      ID date       trial
#   <dbl> <date>     <dbl>
# 1     1 2016-02-01     0
# 2     1 2016-02-01     0
# 3     1 2016-02-01     0
# 4     1 2016-03-21     1
# 5     1 2016-03-21     1
# 6     1 2016-03-21     1
# 7     1 2016-10-05     2
# 8     1 2016-10-05     2
# 9     1 2016-10-05     2
#10     1 2016-10-05     2
#11     2 2016-03-01     0
#12     2 2016-03-01     0
#13     2 2016-03-01     0
#14     2 2016-04-21     1
#15     2 2016-04-21     1
#16     2 2016-04-21     1
#17     2 2016-11-05     2
#18     2 2016-11-05     2
#19     2 2016-11-05     2
#20     2 2016-11-05     2

Or using rle

df %>% 
  group_by(ID) %>%
  mutate(trial = with(rle(is.na(trial)), 
             rep(seq_along(values), lengths))-1)

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389215

Converting your pseudo code to code we can use which.max(trial == 1) to get first occurrence where trial = 1 for each group. This also assumes that there would be at least one entry of 1 in trial for each ID.

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(trial = case_when(is.na(trial) & date < date[which.max(trial == 1)] ~ 0, 
                             is.na(trial) & date > date[which.max(trial == 1)] ~ 2, 
                             TRUE ~ trial))


#      ID date       trial
#    <dbl> <date>     <dbl>
# 1     1 2016-02-01     0
# 2     1 2016-02-01     0
# 3     1 2016-02-01     0
# 4     1 2016-03-21     1
# 5     1 2016-03-21     1
# 6     1 2016-03-21     1
# 7     1 2016-10-05     2
# 8     1 2016-10-05     2
# 9     1 2016-10-05     2
#10     1 2016-10-05     2
#11     2 2016-03-01     0
#12     2 2016-03-01     0
#13     2 2016-03-01     0
#14     2 2016-04-21     1
#15     2 2016-04-21     1
#16     2 2016-04-21     1
#17     2 2016-11-05     2
#18     2 2016-11-05     2
#19     2 2016-11-05     2
#20     2 2016-11-05     2

Upvotes: 1

Related Questions