a_wise
a_wise

Reputation: 71

Splitting a dataframe column where new column values depend upon original data

I often work with dataframes that have columns with character string values that need to be separated. This results from a "select multiple" option in the data entry programme (which I cannot change unfortunately). I have tried tidyr::separate but that does not order the results properly. An example:

require(tidyr)
df = data.frame(
  x = 1:3,
  sick = c(NA, "malaria", "diarrhoea malaria"))

df <- df %>%
  separate(sick, c("diarrhoea", "cough", "malaria"),
           sep = " ", fill = "right", remove = FALSE)

But I want the result to look like this:

df2 = data.frame(
  x = 1:3,
  sick = c(NA, "malaria", "diarrhoea malaria"),
  diarrhoea = c(NA, NA, "diarrhoea"),
  cough = c(NA, NA, NA),
  malaria = c(NA, "malaria", "malaria"))

Any help in the right direction would be much appreciated.

Upvotes: 1

Views: 26

Answers (1)

akrun
akrun

Reputation: 887851

We can try with separate_rows and dcast

library(tidyr)
library(reshape2)
library(dplyr)
separate_rows(df, sick) %>%
  mutate(sick = factor(sick, levels = c("diarrhoea", "cough", "malaria")), sick1 = sick) %>% 
  dcast(., x~sick, value.var = "sick1", drop=FALSE) %>%
  bind_cols(., df[2]) %>%
  select(x, sick, diarrhoea, cough, malaria) 
#  x              sick diarrhoea cough malaria
#1 1              <NA>      <NA>  <NA>    <NA>
#2 2           malaria      <NA>  <NA> malaria
#3 3 diarrhoea malaria diarrhoea  <NA> malaria

Or another option is using cSplit from splitstackshape with dcast from data.table

library(splitstackshape)
dcast(cSplit(df, "sick", " ", "long")[, sick:= factor(sick, levels = 
    c("diarrhoea", "cough", "malaria"))], x~sick, value.var = "sick", drop = FALSE)[,
       sick := df$sick][]

Upvotes: 1

Related Questions