Reputation: 73
My dataset represents patients which have been treated multiple times. The dataset is in a long format, patients either get treatment A, C or S or a combination. A and C are never combined.
Simply put, the data looks something like this:
df <- tibble(PatientID = c(1,1,1,2,2,3,3,3,3,4,4,5,5,5,6,6),
treatment = c("A", "A", "S", "C", "S", "S", "C", "C", NA, "C", NA, NA, "S", "A", "S", NA)
I would like to creat a new variable based on if any patient had treatment A or C or neither, so the end result looking something like:
df <- tibble(PatientID = c(1,1,1,2,2,3,3,3,3,4,4,5,5,5,6,6),
treatment = c("A", "A", "S", "C", "S", "S", "C", "C", NA, "C", NA, NA, "S", "A", "S", "S"),
group = c("A", "A", "A", "C", "C", "C", "C", "C", "C", "C", "C", "A", "A", "A", "S", "S"))
How can I best approach this? I'm struggling with how to deal with multiple observations per ID.
Thank you!
Upvotes: 1
Views: 66
Reputation: 440
You can use group_by()
in combination with mutate()
and case_when()
to achieve this:
library(tidyverse)
df <- tibble(PatientID = c(1,1,1,2,2,3,3,3,3,4,4,5,5,5,6,6),
treatment = c("A", "A", "S", "C", "S", "S", "C", "C", NA, "C", NA, NA, "S", "A", "S", NA))
df %>%
group_by(PatientID) %>%
mutate(groups = case_when("A" %in% treatment ~ "A",
"C" %in% treatment ~ "C",
TRUE ~ "S"))
#> # A tibble: 16 × 3
#> # Groups: PatientID [6]
#> PatientID treatment groups
#> <dbl> <chr> <chr>
#> 1 1 A A
#> 2 1 A A
#> 3 1 S A
#> 4 2 C C
#> 5 2 S C
#> 6 3 S C
#> 7 3 C C
#> 8 3 C C
#> 9 3 <NA> C
#> 10 4 C C
#> 11 4 <NA> C
#> 12 5 <NA> A
#> 13 5 S A
#> 14 5 A A
#> 15 6 S S
#> 16 6 <NA> S
Created on 2022-08-18 with reprex v2.0.2
Upvotes: 0