Reputation: 145
What I am trying to do is split a character column into multiple columns without losing the additional data in the df and the number of columns is variable depending on the input. I guess it's easier with an example:
df <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla", "bla"))
I want to split column c into multiple columns by sep = "\n".
I tried using seperate(df$c, "\n", 10)
but it doesn't work because I use character as separator. 10 is just an idea, so that I rather have more columns than needed instead of dropping info.
I tried using str_split_fixed(df$c, "\n", 10)
which works fine, but it deletes column a and b and I don't know why or how I could fix this.
Additional info: in the end I want to use the command on a list.
Edit:
df1 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla", "bla"))
df2 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla \n ghfdghf \n hdhdh \n hjgfj \n td", "bla"))
map(list(df1, df2), ~.x %>% separate(c, into = paste0('c', seq_len(max(str_count(.x$c, '\n')+1))), sep = '\n', fill = 'right'))
[[1]]
a b c1 c2 c3 c4
1 1 bla one two three <NA>
2 2 word bla why morebla helpme
3 3 otherword bla bla <NA> <NA>
4 4 nice bla <NA> <NA> <NA>
[[2]]
a b c1 c2 c3 c4 c5 c6
1 1 bla one two three <NA> <NA> <NA>
2 2 word bla why morebla helpme <NA> <NA>
3 3 otherword bla bla ghfdghf hdhdh hjgfj td
4 4 nice bla <NA> <NA> <NA> <NA> <NA>
df <- data.frame(unlist(list))
I guess this could cause problems as number of columns are not the same within the list. Expected outcome:
a b c1 c2 c3 c4 c5 c6
1 1 bla one two three <NA> <NA> <NA>
2 2 word bla why morebla helpme <NA> <NA>
3 3 otherword bla bla <NA> <NA> <NA> <NA>
4 4 nice bla <NA> <NA> <NA> <NA> <NA>
5 1 bla one two three <NA> <NA> <NA>
6 2 word bla why morebla helpme <NA> <NA>
7 3 otherword bla bla ghfdghf hdhdh hjgfj td
8 4 nice bla <NA> <NA> <NA> <NA> <NA>
Upvotes: 0
Views: 774
Reputation: 26218
If doing in tidyverse/dplyr pipe kinda syntax, you may use separate
from tidyr
in conjunction with stringr::str_count
which does exactly as you require.
df <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla", "bla"))
library(tidyverse)
df %>% separate(c, into = paste0('c', seq_len(max(str_count(df$c, '\n')+1))), sep = '\n', fill = 'right')
a b c1 c2 c3 c4
1 1 bla one two three <NA>
2 2 word bla why morebla helpme
3 3 otherword bla bla <NA> <NA>
4 4 nice bla <NA> <NA> <NA>
For doing it on list of data.frames, do it like this
df1 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla", "bla"))
df2 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla \n ghfdghf \n hdhdh \n hjgfj \n td", "bla"))
map(list(df1, df2), ~.x %>% separate(c, into = paste0('c', seq_len(max(str_count(.x$c, '\n')+1))), sep = '\n', fill = 'right'))
[[1]]
a b c1 c2 c3 c4
1 1 bla one two three <NA>
2 2 word bla why morebla helpme
3 3 otherword bla bla <NA> <NA>
4 4 nice bla <NA> <NA> <NA>
[[2]]
a b c1 c2 c3 c4 c5 c6
1 1 bla one two three <NA> <NA> <NA>
2 2 word bla why morebla helpme <NA> <NA>
3 3 otherword bla bla ghfdghf hdhdh hjgfj td
4 4 nice bla <NA> <NA> <NA> <NA> <NA>
Further Edit in view of revised question
map_dfr
insteadmap_dfr(list(df1, df2), ~.x %>% separate(c, into = paste0('c', seq_len(max(str_count(.x$c, '\n')+1))), sep = '\n', fill = 'right'))
a b c1 c2 c3 c4 c5 c6
1 1 bla one two three <NA> <NA> <NA>
2 2 word bla why morebla helpme <NA> <NA>
3 3 otherword bla bla <NA> <NA> <NA> <NA>
4 4 nice bla <NA> <NA> <NA> <NA> <NA>
5 1 bla one two three <NA> <NA> <NA>
6 2 word bla why morebla helpme <NA> <NA>
7 3 otherword bla bla ghfdghf hdhdh hjgfj td
8 4 nice bla <NA> <NA> <NA> <NA> <NA>
map*
df1 %>% rbind(df2) %>% separate(c, into = paste0('c', seq_len(max(str_count(.$c, '\n')+1))), sep = '\n', fill = 'right')
a b c1 c2 c3 c4 c5 c6
1 1 bla one two three <NA> <NA> <NA>
2 2 word bla why morebla helpme <NA> <NA>
3 3 otherword bla bla <NA> <NA> <NA> <NA>
4 4 nice bla <NA> <NA> <NA> <NA> <NA>
5 1 bla one two three <NA> <NA> <NA>
6 2 word bla why morebla helpme <NA> <NA>
7 3 otherword bla bla ghfdghf hdhdh hjgfj td
8 4 nice bla <NA> <NA> <NA> <NA> <NA>
Upvotes: 2
Reputation: 1081
cc = strsplit(df$c, "\n")
l = max(lengths(cc))
CC = lapply(cc, function(x) c(x, rep(NA, l-length(x))))
CC = do.call(rbind, CC)
cbind(df[c('a', 'b')], CC)
a b 1 2 3 4
1 bla one two three <NA>
2 word bla why morebla helpme
3 otherword bla bla <NA> <NA>
4 nice bla <NA> <NA> <NA>
Upvotes: 1