split character column into multiple columns

Question

What I am trying to do is split a character column into multiple columns without losing the additional data in the df and the number of columns is variable depending on the input. I guess it's easier with an example:

df <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one 
 two 
 three", "bla 
 why 
 morebla 
 helpme", "bla 
 bla", "bla"))

I want to split column c into multiple columns by sep = " ".

I tried using seperate(df$c, " ", 10) but it doesn't work because I use character as separator. 10 is just an idea, so that I rather have more columns than needed instead of dropping info.
I tried using str_split_fixed(df$c, " ", 10) which works fine, but it deletes column a and b and I don't know why or how I could fix this.

Additional info: in the end I want to use the command on a list.

Edit:

df1 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one 
 two 
 three", "bla 
 why 
 morebla 
 helpme", "bla 
 bla", "bla"))
df2 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one 
 two 
 three", "bla 
 why 
 morebla 
 helpme", "bla 
 bla  
 ghfdghf 
 hdhdh 
 hjgfj 
 td", "bla"))

map(list(df1, df2), ~.x %>% separate(c, into = paste0('c', seq_len(max(str_count(.x$c, '
')+1))), sep = '
', fill = 'right'))

[[1]]
  a         b   c1    c2        c3      c4
1 1       bla one   two      three    
2 2      word bla   why   morebla   helpme
3 3 otherword bla    bla          
4 4      nice  bla            

[[2]]
  a         b   c1     c2        c3      c4      c5   c6
1 1       bla one    two      three         
2 2      word bla    why   morebla   helpme     
3 3 otherword bla   bla    ghfdghf   hdhdh   hjgfj    td
4 4      nice  bla                   

df <- data.frame(unlist(list))

I guess this could cause problems as number of columns are not the same within the list. Expected outcome:

  a         b   c1     c2        c3      c4      c5   c6
1 1       bla one   two      three           
2 2      word bla   why   morebla   helpme       
3 3 otherword bla    bla                 
4 4      nice  bla                   
5 1       bla one    two      three         
6 2      word bla    why   morebla   helpme     
7 3 otherword bla   bla    ghfdghf   hdhdh   hjgfj    td
8 4      nice  bla

AnilGoyal · Accepted Answer

If doing in tidyverse/dplyr pipe kinda syntax, you may use separate from tidyr in conjunction with stringr::str_count which does exactly as you require.

df <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one 
 two 
 three", "bla 
 why 
 morebla 
 helpme", "bla 
 bla", "bla"))

library(tidyverse)
df %>% separate(c, into = paste0('c', seq_len(max(str_count(df$c, '
')+1))), sep = '
', fill = 'right')

  a         b   c1    c2        c3      c4
1 1       bla one   two      three    
2 2      word bla   why   morebla   helpme
3 3 otherword bla    bla          
4 4      nice  bla

For doing it on list of data.frames, do it like this

df1 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one 
 two 
 three", "bla 
 why 
 morebla 
 helpme", "bla 
 bla", "bla"))
df2 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one 
 two 
 three", "bla 
 why 
 morebla 
 helpme", "bla 
 bla  
 ghfdghf 
 hdhdh 
 hjgfj 
 td", "bla"))

map(list(df1, df2), ~.x %>% separate(c, into = paste0('c', seq_len(max(str_count(.x$c, '
')+1))), sep = '
', fill = 'right'))

[[1]]
  a         b   c1    c2        c3      c4
1 1       bla one   two      three    
2 2      word bla   why   morebla   helpme
3 3 otherword bla    bla          
4 4      nice  bla            

[[2]]
  a         b   c1     c2        c3      c4      c5   c6
1 1       bla one    two      three         
2 2      word bla    why   morebla   helpme     
3 3 otherword bla   bla    ghfdghf   hdhdh   hjgfj    td
4 4      nice  bla

Further Edit in view of revised question

Use map_dfr instead

map_dfr(list(df1, df2), ~.x %>% separate(c, into = paste0('c', seq_len(max(str_count(.x$c, '
')+1))), sep = '
', fill = 'right'))

  a         b   c1     c2        c3      c4      c5   c6
1 1       bla one    two      three         
2 2      word bla    why   morebla   helpme     
3 3 otherword bla     bla               
4 4      nice  bla                  
5 1       bla one    two      three         
6 2      word bla    why   morebla   helpme     
7 3 otherword bla   bla    ghfdghf   hdhdh   hjgfj    td
8 4      nice  bla

But I cannot see a reason why doing it on separate items of list and then r-binding instead of first r-binding and then simply doing it without map*

df1 %>% rbind(df2) %>% separate(c, into = paste0('c', seq_len(max(str_count(.$c, '
')+1))), sep = '
', fill = 'right')

  a         b   c1     c2        c3      c4      c5   c6
1 1       bla one    two      three         
2 2      word bla    why   morebla   helpme     
3 3 otherword bla     bla               
4 4      nice  bla                  
5 1       bla one    two      three         
6 2      word bla    why   morebla   helpme     
7 3 otherword bla   bla    ghfdghf   hdhdh   hjgfj    td
8 4      nice  bla

split character column into multiple columns

Answers (2)

Related Questions