Reputation: 165
I have a dataset with PatientID and their Disease test results and they are as follows :
Id Result
1 Strep A: Positive
2 Flu A: Negative, Flu B: Negative
3 Rsv: Positive, RsvA: Negative, RsvB: Positive
4 Strep A: Negative
5 Flu A: Negative, Flu B: Negative
6 Flu A: Negative, Flu B: Negative
7 Strep A: Positive
How can I split the Result
column as follows :
Id Result_Strep A Result_Flu A Result_Flu B Result_Rsv Result_RsvA Result_RsvB
1 Positive NA NA NA NA NA
2 NA Negative Negative NA NA NA
3 NA NA NA Positive Negative Positive
4 Negative NA NA NA NA NA
5 NA Negative Negative NA NA NA
6 NA Negative Negative NA NA NA
7 Positive NA NA NA NA NA
dput of data
structure(list(Id = 1:7, Result = c("Strep A: Positive", "Flu A: Negative, Flu B: Negative",
"Rsv: Positive, RsvA: Negative, RsvB: Positive", "Strep A: Negative",
"Flu A: Negative, Flu B: Negative", "Flu A: Negative, Flu B: Negative",
"Strep A: Positive")), row.names = c(NA, -7L), class = "data.frame")
Upvotes: 1
Views: 74
Reputation: 887971
We can use separate_rows
to split at the ,
, then separate
the column into two, and reshape into 'wide' format
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
separate_rows(Result, sep=", ") %>%
separate(Result, into = c("Result1", "Result2"), sep=":\\s*") %>%
mutate(Result1 = str_c("Result_", Result1)) %>%
# in case of duplicate elements uncomment the commented code below
#group_by(Result1) %>%
#mutate(rn = row_number()) %>%
#ungroup %>%
pivot_wider(names_from = Result1, values_from = Result2)# %>%
#select(-rn)
# A tibble: 7 x 7
# Id `Result_Strep A` `Result_Flu A` `Result_Flu B` Result_Rsv Result_RsvA Result_RsvB
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 Positive <NA> <NA> <NA> <NA> <NA>
#2 2 <NA> Negative Negative <NA> <NA> <NA>
#3 3 <NA> <NA> <NA> Positive Negative Positive
#4 4 Negative <NA> <NA> <NA> <NA> <NA>
#5 5 <NA> Negative Negative <NA> <NA> <NA>
#6 6 <NA> Negative Negative <NA> <NA> <NA>
#7 7 Positive <NA> <NA> <NA> <NA> <NA>
df1 <- structure(list(Id = 1:7, Result = c("Strep A: Positive",
"Flu A: Negative, Flu B: Negative",
"Rsv: Positive, RsvA: Negative, RsvB: Positive", "Strep A: Negative",
"Flu A: Negative, Flu B: Negative", "Flu A: Negative, Flu B: Negative",
"Strep A: Positive")), class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 2