Jaume
Jaume

Reputation: 189

Split data frame string column into multiple columns (comma separated characters)

I am trying to split a column of comma separated characters into several columns (as many as different characters overall). I read similar questions such as this:

Split data frame string column into multiple columns

But the solutions were based on a small number of possible characters, so that the new columns could be named in advanced before the columns was split.

This is my example:

subject <- c(1,2,3)
 letters <- c("a, b, f, g", "b, g, m, l", "g, m, z")

df1 <- data.frame(subject, letters)

df1

 subject    letters
1       1 a, b, f, g
2       2 b, g, m, l
3       3    g, m, z

The desired result would be:

  subject a b f g m z
1       1 1 1 1 1 0 0
2       2 0 1 0 1 1 0
3       3 0 0 0 1 1 1

Thanks in advance for any help.

Upvotes: 4

Views: 1743

Answers (4)

Andre Wildberg
Andre Wildberg

Reputation: 19088

A base R approach that grepls the unique list of letters against each row of trimws and strsplited letters. Has the advantage of being able to easily interact with each step manually in case any adjustment needs to be done, e.g. order changes or leaving out letters etc.

list_letter <- sapply(strsplit(df1$letters, ","), trimws)
letter <- unique(unlist(list_letter))
letter
[1] "a" "b" "f" "g" "m" "l" "z"

data.frame( subject=df1$subject, 
            sapply(letter, function(x) sapply(list_letter, function(y) 
              any(grepl(x, y))))*1 )
  subject a b f g m l z
1       1 1 1 1 1 0 0 0
2       2 0 1 0 1 1 1 0
3       3 0 0 0 1 1 0 1

Upvotes: 2

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

Another solution:

library(dplyr)
library(tidyr)
df1 %>%
  mutate(row = row_number()) %>%
  separate_rows(letters, sep = ', ') %>%
  pivot_wider(names_from = letters, values_from = letters, 
              values_fn = function(x) 1, values_fill = 0) %>%
  select(-row)
# A tibble: 3 × 8
  subject     a     b     f     g     m     l     z
    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1       1     1     1     1     1     0     0     0
2       2     0     1     0     1     1     1     0
3       3     0     0     0     1     1     0     1

Upvotes: 2

nniloc
nniloc

Reputation: 4243

One option using str_split, unnest_longer and table

subject <- c(1,2,3)
letters <- c("a, b, f, g", "b, g, m, l", "g, m, z")

df1 <- data.frame(subject, letters)


library(tidyverse)

df1 %>%
  mutate(letters = str_split(letters, ', ')) %>%
  unnest_longer(letters) %>%
  table
#>        letters
#> subject a b f g l m z
#>       1 1 1 1 1 0 0 0
#>       2 0 1 0 1 1 1 0
#>       3 0 0 0 1 0 1 1

Created on 2022-02-10 by the reprex package (v2.0.0)


Seeing some of the other answers, separate_rows is a better solution here

df1 %>%
  separate_rows(letters) %>%
  table

Upvotes: 4

AndrewGB
AndrewGB

Reputation: 16836

Another tidyverse possibility:

library(tidyverse)

df1 %>%
  separate_rows(letters) %>%
  mutate(value = 1) %>%
  pivot_wider(names_from = letters, values_fill = 0)

  subject     a     b     f     g     m     l     z
    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1       1     1     1     1     1     0     0     0
2       2     0     1     0     1     1     1     0
3       3     0     0     0     1     1     0     1

Upvotes: 2

Related Questions