Reputation: 585
I have a data frame as following
Sample_ID<-c("a1","a2","a3","a4","a5","a6")
Heart_attack<-c("2010/04/13", "2008/07/30", "2009/03/06", "2008/08/22", "2009/06/24", "2008/08/26")
Stroke<-c("2007/05/17", "2009/05/16", "2007/05/16", "2007/05/16","2007/05/16", "2010/05/16")
DF<-data.frame(Sample_ID,Heart_attack,Stroke)
I need to make TWO COLUMNS. One column in this dataframe called CVD_date
. All i want is that among the Heart_attack
and Stroke
, the event occurred earlier that "date" should be included in this variable. For example i am looking for following output.
The second column CVD
should show 1
if the event reported in CVD_date
is of Heart_attack
and 2
otherwise.
For example i am looking for following output.
Sample ID Heart_attack Stroke CVD_date CVD
a1 2010/04/13 2007/05/17 2007/05/17 2
a2 2008/07/30 2009/05/16 2008/07/30 1
a3 2009/03/06 2007/05/16 2007/05/16 1
How to do this in R?
Upvotes: 0
Views: 319
Reputation: 388817
You can use pmin
to get minimum between Heart_attack
and Stroke
date. For CVD
we compare both the dates, convert the logical values to integer and add 1 which will give 1 if Stroke
is greater than Heart_attack
date and 2 otherwise..
library(dplyr)
DF %>%
mutate(across(-1, lubridate::ymd),
CVD_date = pmin(Heart_attack, Stroke),
CVD = as.integer(Heart_attack > Stroke) + 1)
# Sample_ID Heart_attack Stroke CVD_date CVD
#1 a1 2010-04-13 2007-05-17 2007-05-17 2
#2 a2 2008-07-30 2009-05-16 2008-07-30 1
#3 a3 2009-03-06 2007-05-16 2007-05-16 2
#4 a4 2008-08-22 2007-05-16 2007-05-16 2
#5 a5 2009-06-24 2007-05-16 2007-05-16 2
#6 a6 2008-08-26 2010-05-16 2008-08-26 1
In the older version you can do :
DF %>%
mutate_at(-1, lubridate::ymd) %>%
mutate(CVD_date = pmin(Heart_attack, Stroke),
CVD = as.integer(Heart_attack > Stroke) + 1)
Upvotes: 1