karuno
karuno

Reputation: 411

mutate variable based on other columns with similar names

I have a df here (the desired output, my starting df does not have the Flag variable):

df <- data.frame(
  Person = c('1','2','3'),
  Date = as.Date(c('2010-09-30', '2012-11-20', '2015-03-11')),
  Treatment_1 = as.Date(c('2010-09-30', '2012-11-21', '2015-03-22')),
  Treatment_2 = as.Date(c('2011-09-30', 'NA', '2011-03-22')),
  Treatment_3 = as.Date(c('2012-09-30', '2015-11-21', '2015-06-22')),
  Surgery_1 = as.Date(c(NA, '2016-11-21', '2015-03-12')),
  Surgery_2 = as.Date(c(NA, '2017-11-21', '2019-03-12')),
  Surgery_3 = as.Date(c(NA, '2018-11-21', '2013-03-12')),
  Flag = c('', 'Y', '') 
)

and I want to derive the Flag variable based on these conditions:

  1. For any column that starts with Treatment, set Flag to "" if Date = Treatment
  2. For any column that starts with Surgery, set Flag to "" if Date = Surgery OR Date = Surgery +1 OR Date = Surgery - 1 (basically if the Surgery date is on the day, one day before, or one day after the Date variable, set Flag to "").
  3. else set Flag = "Y"

I've looked into mutate_at but that rewrites the variables and assigns values of True/False.

This is wrong but this is my attempt:

df2 <- df %>%
  mutate(Flag = case_when(
    vars(starts_with("Treatment"), Date == . ) ~ '',
    vars(starts_with("Surgery"), Date == . | Date == . - 1 | Date == . + 1) ~ '',
    TRUE ~ 'Y')
  )

UPDATE 2022-Aug-22

When I change a cell with the same date as the one in row 2:

df <- data.frame(
  Person = c('1','2','3'),
  Date = as.Date(c('2010-09-30', '2012-11-20', '2015-03-11')),
  Treatment_1 = as.Date(c('2010-09-30', '2012-11-21', '2015-03-22')),
  Treatment_2 = as.Date(c('2011-09-30', 'NA', '2011-03-22')),
  Treatment_3 = as.Date(c('2012-09-30', '2015-11-21', '2015-06-22')),
  Surgery_1 = as.Date(c(NA, '2016-11-21', '2015-03-12')),
  Surgery_2 = as.Date(c(NA, '2017-11-21', '2019-03-12')),
  Surgery_3 = as.Date(c(NA, '2018-11-21', '2012-11-20')),
  Flag = c('', 'Y', '') 
)

and then re-run the base R solution, the Flag in the second row is no longer "Y" but it should be as in that row, it doesn't meet any of the above conditions.

Upvotes: 3

Views: 1247

Answers (3)

AndrewGB
AndrewGB

Reputation: 16876

We can use rowwise and c_across along with any for each condition in case_when. Then, we can make a list for the Date (and +1, -1 days) for Surgery to match.

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(Flag = case_when(
    any(c_across(starts_with("Treatment")) == Date) ~ "",
    any(c_across(starts_with("Surgery")) %in% c(Date, (Date +1), (Date-1))) ~ "",
    TRUE ~ "Y"
  ))

Output

  Person Date       Treatment_1 Treatment_2 Treatment_3 Surgery_1  Surgery_2  Surgery_3  Flag 
  <chr>  <date>     <date>      <date>      <date>      <date>     <date>     <date>     <chr>
1 1      2010-09-30 2010-09-30  2011-09-30  2012-09-30  NA         NA         NA         ""   
2 2      2012-11-20 2012-11-21  NA          2015-11-21  2016-11-21 2017-11-21 2018-11-21 "Y"  
3 3      2015-03-11 2015-03-22  2011-03-22  2015-06-22  2015-03-12 2019-03-12 2013-03-12 "" 

Update

Here is a possible base R solution that is a lot quicker than tidyverse. This could be done in one line of code, but I decided that readability is better. First, I duplicate the Surgery columns so that we have +1 day and -1 day, and then convert these columns to character. Then, I subset the Treatment columns and convert to character. I convert to character as you cannot compare Date with %in% or ==. Then, I bind the date, treatment, and surgery columns together (a). Then, I use an ifelse for if the Date is in any of the columns but doing it row by row with apply, then we return "" and if not then return Y. Then, I bind the result back to the original dataframe (minus Flag from your original dataframe).

dup_names <- colnames(df)[startsWith(colnames(df), "Surgery")]

surgery <-
  cbind(df[dup_names], setNames(df[dup_names] + 1, paste0(dup_names, "_range1")))

surgery <-
  sapply(cbind(surgery, setNames(df[dup_names] - 1, paste0(
    dup_names, "_range2"
  ))), as.character)

treatment <-
  sapply(df[startsWith(colnames(df), "Treatment")], as.character)

a <- cbind(Date = as.character(df$Date), treatment, surgery)

cbind(subset(df, select = -Flag),
      Flag = ifelse(apply(a[,1]==a[,2:ncol(a)], 1, any, na.rm = TRUE), "", "Y"))

Benchmark

enter image description here

Upvotes: 3

langtang
langtang

Reputation: 24867

Updated to add data.table approach

If you want a data.table approach, here it is:

df[melt(df, id=c(1,2))[,flag:=fifelse(
  (str_starts(variable,"T") & value==Date) | 
    (str_starts(variable,"S") & abs(value-Date)<=1),"", "Y")][
      , .(flag=min(flag,na.rm=T)), Person], on=.(Person)]

Output


   Person       Date Treatment_1 Treatment_2 Treatment_3  Surgery_1  Surgery_2  Surgery_3 flag
1:      1 2010-09-30  2010-09-30  2011-09-30  2012-09-30       <NA>       <NA>       <NA>     
2:      2 2012-11-20  2012-11-21        <NA>  2015-11-21 2016-11-21 2017-11-21 2018-11-21    Y
3:      3 2015-03-11  2015-03-22  2011-03-22  2015-06-22 2015-03-12 2019-03-12 2013-03-12     

I like Andrew's approach, but I was working on this when his answer came in, so here it is in case you are interested

df %>% inner_join(
  pivot_longer(df, cols=Treatment_1:Surgery_3) %>% 
    mutate(flag=case_when(
        (str_starts(name,"T") & value==Date) | (str_starts(name,"S") & abs(value-Date)<=1) ~ "",
        TRUE ~"Y")) %>% 
    group_by(Person) %>% 
    summarize(flag = min(flag))
)

Output:

  Person       Date Treatment_1 Treatment_2 Treatment_3  Surgery_1  Surgery_2  Surgery_3 flag
1      1 2010-09-30  2010-09-30  2011-09-30  2012-09-30       <NA>       <NA>       <NA>     
2      2 2012-11-20  2012-11-21        <NA>  2015-11-21 2016-11-21 2017-11-21 2018-11-21    Y
3      3 2015-03-11  2015-03-22  2011-03-22  2015-06-22 2015-03-12 2019-03-12 2013-03-12     

Upvotes: 2

TarJae
TarJae

Reputation: 79311

Here is an alternative using across approach:

library(tidyverse)

df %>% 
  mutate(across(starts_with("Treatment"), ~as.numeric(. %in% Date), .names ="new_{.col}"),
         across(starts_with("Surgery"), ~as.numeric(. %in% c(Date, Date+1, Date-1)), .names ="new_{.col}")) %>% 
  mutate(Flag = ifelse(rowSums(select(., contains('new')))==1, "", "Y"), .keep="used") %>% 
  bind_cols(df)
  Flag Person       Date Treatment_1 Treatment_2 Treatment_3  Surgery_1  Surgery_2  Surgery_3
1           1 2010-09-30  2010-09-30  2011-09-30  2012-09-30       <NA>       <NA>       <NA>
2    Y      2 2012-11-20  2012-11-21        <NA>  2015-11-21 2016-11-21 2017-11-21 2018-11-21
3           3 2015-03-11  2015-03-22  2011-03-22  2015-06-22 2015-03-12 2019-03-12 2013-03-12

Upvotes: 2

Related Questions