Reputation: 411
I have a df here (the desired output, my starting df does not have the Flag variable):
df <- data.frame(
Person = c('1','2','3'),
Date = as.Date(c('2010-09-30', '2012-11-20', '2015-03-11')),
Treatment_1 = as.Date(c('2010-09-30', '2012-11-21', '2015-03-22')),
Treatment_2 = as.Date(c('2011-09-30', 'NA', '2011-03-22')),
Treatment_3 = as.Date(c('2012-09-30', '2015-11-21', '2015-06-22')),
Surgery_1 = as.Date(c(NA, '2016-11-21', '2015-03-12')),
Surgery_2 = as.Date(c(NA, '2017-11-21', '2019-03-12')),
Surgery_3 = as.Date(c(NA, '2018-11-21', '2013-03-12')),
Flag = c('', 'Y', '')
)
and I want to derive the Flag variable based on these conditions:
I've looked into mutate_at but that rewrites the variables and assigns values of True/False.
This is wrong but this is my attempt:
df2 <- df %>%
mutate(Flag = case_when(
vars(starts_with("Treatment"), Date == . ) ~ '',
vars(starts_with("Surgery"), Date == . | Date == . - 1 | Date == . + 1) ~ '',
TRUE ~ 'Y')
)
UPDATE 2022-Aug-22
When I change a cell with the same date as the one in row 2:
df <- data.frame(
Person = c('1','2','3'),
Date = as.Date(c('2010-09-30', '2012-11-20', '2015-03-11')),
Treatment_1 = as.Date(c('2010-09-30', '2012-11-21', '2015-03-22')),
Treatment_2 = as.Date(c('2011-09-30', 'NA', '2011-03-22')),
Treatment_3 = as.Date(c('2012-09-30', '2015-11-21', '2015-06-22')),
Surgery_1 = as.Date(c(NA, '2016-11-21', '2015-03-12')),
Surgery_2 = as.Date(c(NA, '2017-11-21', '2019-03-12')),
Surgery_3 = as.Date(c(NA, '2018-11-21', '2012-11-20')),
Flag = c('', 'Y', '')
)
and then re-run the base R solution, the Flag in the second row is no longer "Y" but it should be as in that row, it doesn't meet any of the above conditions.
Upvotes: 3
Views: 1247
Reputation: 16876
We can use rowwise
and c_across
along with any
for each condition in case_when
. Then, we can make a list for the Date
(and +1, -1 days) for Surgery
to match.
library(tidyverse)
df %>%
rowwise() %>%
mutate(Flag = case_when(
any(c_across(starts_with("Treatment")) == Date) ~ "",
any(c_across(starts_with("Surgery")) %in% c(Date, (Date +1), (Date-1))) ~ "",
TRUE ~ "Y"
))
Output
Person Date Treatment_1 Treatment_2 Treatment_3 Surgery_1 Surgery_2 Surgery_3 Flag
<chr> <date> <date> <date> <date> <date> <date> <date> <chr>
1 1 2010-09-30 2010-09-30 2011-09-30 2012-09-30 NA NA NA ""
2 2 2012-11-20 2012-11-21 NA 2015-11-21 2016-11-21 2017-11-21 2018-11-21 "Y"
3 3 2015-03-11 2015-03-22 2011-03-22 2015-06-22 2015-03-12 2019-03-12 2013-03-12 ""
Update
Here is a possible base R solution that is a lot quicker than tidyverse
. This could be done in one line of code, but I decided that readability is better. First, I duplicate the Surgery
columns so that we have +1 day and -1 day, and then convert these columns to character. Then, I subset the Treatment
columns and convert to character. I convert to character as you cannot compare Date
with %in%
or ==
. Then, I bind the date, treatment, and surgery columns together (a
). Then, I use an ifelse
for if the Date
is in any of the columns but doing it row by row with apply
, then we return ""
and if not then return Y
. Then, I bind the result back to the original dataframe (minus Flag
from your original dataframe).
dup_names <- colnames(df)[startsWith(colnames(df), "Surgery")]
surgery <-
cbind(df[dup_names], setNames(df[dup_names] + 1, paste0(dup_names, "_range1")))
surgery <-
sapply(cbind(surgery, setNames(df[dup_names] - 1, paste0(
dup_names, "_range2"
))), as.character)
treatment <-
sapply(df[startsWith(colnames(df), "Treatment")], as.character)
a <- cbind(Date = as.character(df$Date), treatment, surgery)
cbind(subset(df, select = -Flag),
Flag = ifelse(apply(a[,1]==a[,2:ncol(a)], 1, any, na.rm = TRUE), "", "Y"))
Benchmark
Upvotes: 3
Reputation: 24867
If you want a data.table approach, here it is:
df[melt(df, id=c(1,2))[,flag:=fifelse(
(str_starts(variable,"T") & value==Date) |
(str_starts(variable,"S") & abs(value-Date)<=1),"", "Y")][
, .(flag=min(flag,na.rm=T)), Person], on=.(Person)]
Output
Person Date Treatment_1 Treatment_2 Treatment_3 Surgery_1 Surgery_2 Surgery_3 flag
1: 1 2010-09-30 2010-09-30 2011-09-30 2012-09-30 <NA> <NA> <NA>
2: 2 2012-11-20 2012-11-21 <NA> 2015-11-21 2016-11-21 2017-11-21 2018-11-21 Y
3: 3 2015-03-11 2015-03-22 2011-03-22 2015-06-22 2015-03-12 2019-03-12 2013-03-12
I like Andrew's approach, but I was working on this when his answer came in, so here it is in case you are interested
df %>% inner_join(
pivot_longer(df, cols=Treatment_1:Surgery_3) %>%
mutate(flag=case_when(
(str_starts(name,"T") & value==Date) | (str_starts(name,"S") & abs(value-Date)<=1) ~ "",
TRUE ~"Y")) %>%
group_by(Person) %>%
summarize(flag = min(flag))
)
Output:
Person Date Treatment_1 Treatment_2 Treatment_3 Surgery_1 Surgery_2 Surgery_3 flag
1 1 2010-09-30 2010-09-30 2011-09-30 2012-09-30 <NA> <NA> <NA>
2 2 2012-11-20 2012-11-21 <NA> 2015-11-21 2016-11-21 2017-11-21 2018-11-21 Y
3 3 2015-03-11 2015-03-22 2011-03-22 2015-06-22 2015-03-12 2019-03-12 2013-03-12
Upvotes: 2
Reputation: 79311
Here is an alternative using across
approach:
library(tidyverse)
df %>%
mutate(across(starts_with("Treatment"), ~as.numeric(. %in% Date), .names ="new_{.col}"),
across(starts_with("Surgery"), ~as.numeric(. %in% c(Date, Date+1, Date-1)), .names ="new_{.col}")) %>%
mutate(Flag = ifelse(rowSums(select(., contains('new')))==1, "", "Y"), .keep="used") %>%
bind_cols(df)
Flag Person Date Treatment_1 Treatment_2 Treatment_3 Surgery_1 Surgery_2 Surgery_3
1 1 2010-09-30 2010-09-30 2011-09-30 2012-09-30 <NA> <NA> <NA>
2 Y 2 2012-11-20 2012-11-21 <NA> 2015-11-21 2016-11-21 2017-11-21 2018-11-21
3 3 2015-03-11 2015-03-22 2011-03-22 2015-06-22 2015-03-12 2019-03-12 2013-03-12
Upvotes: 2