Reputation: 11
I have 1 dataframe like this
id date_0_0 date_1_0 date_2_0 date_3_0 type_0_0 type_1_0 type_2_0 type_3_0
1 5/17/1966 . . . C10 . . .
2 3/7/2012 . . . C34 . . .
3 . . . . . . . .
4 5/11/1998 . . 2/1/1984 C34 . . C34
Those date_?_0
are corresponding to type_?_0
and I have around 200 set of those variables.
What I need to do is I need to create a new varaible ,lets call it cancer
.
If someone did not have any C34 among their type_?_0
variables then their cancer
=N/A.
If someone have C34 among their type_?_0
variables then their cancer
=corresponding date.
If someone have multiple C34 among their type_?_0
variables then their cancer
=their minimumcorresponding date.
So my ideal output is
id cancer
1 .
2 3/7/2012
3 .
4 2/1/1984
I have to do it in R but I usually use SAS....In SAS I just create three arrays, first one is date, second one is type and third one is cancer, then I loop through date and type array to assign new values for cancer array. Finally I create a new varaible equal to min(cancer array).
Please help. Thanks !!!
Upvotes: 0
Views: 55
Reputation: 14774
Try:
library(tidyverse)
df %>%
pivot_longer(-id) %>%
group_by(id, idx = gsub('.*(\\d+)_(\\d+)', '\\1\\2', name))%>%
mutate(value = case_when(any(value == 'C34') & grepl('\\d+\\/', value) ~ value))%>%
group_by(id) %>%
arrange(as.Date(as.character(value), '%d/%m/%Y')) %>%
select(id, cancer = value) %>%
slice(1L)
Output:
# A tibble: 4 x 2
# Groups: id [4]
id cancer
<int> <fct>
1 1 NA
2 2 3/7/2012
3 3 NA
4 4 2/1/1984
Upvotes: 2