splitting character strings to a dataframe with one column for each character

Question

In the following dataset:

df <- data.frame(barcode=c("B1","B2", "B3"), 
                 sequence= sapply(1:3, function(x) paste(sample(c("A","C","T","G"), 8, replace=T), collapse=""))

i want to split df$sequence into 8 additional columns containing each one of the character strings in the correct order.

I know how to split the character vectors, but this ends up in a list:

library(stringr)
list1 <- str_extract_all(df$sequence,boundary("character"))
[[1]]
[1] "A" "T" "C" "G" "T" "G" "A" "A"

[[2]]
[1] "T" "C" "C" "T" "A" "T" "A" "T"

[[3]]
[1] "C" "G" "T" "T" "A" "A" "G" "G"

str(list1)
List of 3
 $ : chr [1:8] "A" "T" "C" "G" ...
 $ : chr [1:8] "T" "C" "C" "T" ...
 $ : chr [1:8] "C" "G" "T" "T" ...

How to convert this list into a dataframe or is there a simpler way?

edit:

I could go with:

df$pos1 <- sapply(list1, function(x) x[1])
df$pos2 <- sapply(list1, function(x) x[2])

but i guess there are better solutions.

Jilber Urbina · Accepted Answer

Using R base:

> data.frame(do.call(rbind, strsplit(df$sequence, "")))
  X1 X2 X3 X4 X5 X6 X7 X8
1  T  A  A  T  C  A  A  A
2  T  T  A  A  A  T  G  G
3  C  G  A  A  T  C  C  T

splitting character strings to a dataframe with one column for each character

Answers (2)

Related Questions