Reputation: 71
I often work with dataframes that have columns with character string values that need to be separated. This results from a "select multiple" option in the data entry programme (which I cannot change unfortunately). I have tried tidyr::separate
but that does not order the results properly. An example:
require(tidyr)
df = data.frame(
x = 1:3,
sick = c(NA, "malaria", "diarrhoea malaria"))
df <- df %>%
separate(sick, c("diarrhoea", "cough", "malaria"),
sep = " ", fill = "right", remove = FALSE)
But I want the result to look like this:
df2 = data.frame(
x = 1:3,
sick = c(NA, "malaria", "diarrhoea malaria"),
diarrhoea = c(NA, NA, "diarrhoea"),
cough = c(NA, NA, NA),
malaria = c(NA, "malaria", "malaria"))
Any help in the right direction would be much appreciated.
Upvotes: 1
Views: 26
Reputation: 887851
We can try with separate_rows
and dcast
library(tidyr)
library(reshape2)
library(dplyr)
separate_rows(df, sick) %>%
mutate(sick = factor(sick, levels = c("diarrhoea", "cough", "malaria")), sick1 = sick) %>%
dcast(., x~sick, value.var = "sick1", drop=FALSE) %>%
bind_cols(., df[2]) %>%
select(x, sick, diarrhoea, cough, malaria)
# x sick diarrhoea cough malaria
#1 1 <NA> <NA> <NA> <NA>
#2 2 malaria <NA> <NA> malaria
#3 3 diarrhoea malaria diarrhoea <NA> malaria
Or another option is using cSplit
from splitstackshape
with dcast
from data.table
library(splitstackshape)
dcast(cSplit(df, "sick", " ", "long")[, sick:= factor(sick, levels =
c("diarrhoea", "cough", "malaria"))], x~sick, value.var = "sick", drop = FALSE)[,
sick := df$sick][]
Upvotes: 1