split dataframe column containing a string of comma separated values into multiple columns

Question

I have a dataframe column that looks similar to the following:

ColA
""
"1"
"1,9"
"9,19"
"1,9,19"

There are hundreds of rows in the dataframe, and the numbers 1-30 can appear in the comma separated lists. I would like to separate these values into their own columns and indicate whether the number was present with a boolean.

colA_1  colA_9  colA_19

1       0       0                    
1       1       0
0       1       1
1       1       1

I know using grepl("[1][9]",df$ColA), for example, will grab the desired rows and I can then use that to create a new column for the number 19, but I can't figure out what the regular expression is to grab the single digit number (e.g., 1) without grabbing other numbers (e.g., 11 or 21). I'll admit I'm not super familiar with regular expressions and have usually not needed to use them. But my few google searches have been in vain.

If it were just a single string, a combination of unlist and strsplit would do the trick, but I don't know how to apply that to the entire column of the dataframe without just iterating over the rows.

I'd like to learn a solution that doesn't use a loop, it can be with regex or anything that gets the job done.

eastclintw00d · Accepted Answer

Here is a tidyverse solution. The seperate call gives you a warning but otherwise it works fine:

library(tidyverse)
df <- tibble(ColA = c("", "1", "1,9", "9,19", "1,9,19"))

df1 <- df %>% 
  mutate(n = row_number()) %>% 
  separate(ColA, into = str_c("ColA", 1:30)) %>% 
  gather(key, value, -n) %>% 
  filter(!is.na(value), value != "") %>% 
  mutate(
    key = str_c(str_sub(key, 1, 4), value),
    value2 = 1
  ) %>%
  select(-value) %>% 
  spread(key, value2)

## replace missings
df1[is.na(df1)] <- 0

split dataframe column containing a string of comma separated values into multiple columns

Answers (1)

Related Questions