splitting all the columns in a dataframe based on their value and delimiter

Question

I have a dataframe as follows:

df <- data.frame(s1=c("a","a/b","b","a","a/b"),s2=c("ab/bb","bb","ab","ab","bb"),s3=c("Doa","Doa","Dob/Doa","Dob/Doa","Dob"))

   s1    s2      s3
1   a ab/bb     Doa
2 a/b    bb     Doa
3   b    ab Dob/Doa
4   a    ab Dob/Doa
5 a/b    bb     Dob

Each column could take one of two values or both separated by a "/". I would like to break these down into binary sets of columns based on their values.

The desired data frame would be:

   a   b   ab   bb   Doa   Dob
1  1   0    1    1    1    0
2  1   1    0    1    1    0
3  0   1    1    0    1    1
4  1   0    1    0    1    1
5  1   1    0    1    0    1

I tried doing this with tidyr::separate and tapply, though it got fairly complicated as I had to specify column names for every pair. There were many columns.

akrun · Accepted Answer

Another approach is to usepivot_longer into 'long' format and then use separate_rows to split the 'value' column and reshape into 'wide' format

library(dplyr)
library(tidyr)
df %>%
    mutate(rn = row_number()) %>%
    pivot_longer(cols = -rn) %>%
    separate_rows(value) %>%
    mutate(i1 = 1) %>%
    select(-name) %>%
    pivot_wider(names_from = value, values_from = i1, values_fill = list(i1 = 0)) %>%
    select(-rn)
# A tibble: 5 x 6
#      a    ab    bb   Doa     b   Dob
#       
#1     1     1     1     1     0     0
#2     1     0     1     1     1     0
#3     0     1     0     1     1     1
#4     1     1     0     1     0     1
#5     1     0     1     0     1     1

Or using base R with table and strsplit

+(table(stack(setNames(strsplit(as.character(unlist(df)), "/",
       fixed = TRUE), c(row(df))))[2:1]) > 0)
#  values
#ind a ab b bb Doa Dob
#  1 1  1 0  1   1   0
#  2 1  0 1  1   1   0
#  3 0  1 1  0   1   1
#  4 1  1 0  0   1   1
#  5 1  0 1  1   0   1

splitting all the columns in a dataframe based on their value and delimiter

Answers (2)

Related Questions