Reputation: 3471
In the following dataset:
df <- data.frame(barcode=c("B1","B2", "B3"),
sequence= sapply(1:3, function(x) paste(sample(c("A","C","T","G"), 8, replace=T), collapse=""))
i want to split df$sequence
into 8 additional columns containing each one of the character strings in the correct order.
I know how to split the character vectors, but this ends up in a list:
library(stringr)
list1 <- str_extract_all(df$sequence,boundary("character"))
[[1]]
[1] "A" "T" "C" "G" "T" "G" "A" "A"
[[2]]
[1] "T" "C" "C" "T" "A" "T" "A" "T"
[[3]]
[1] "C" "G" "T" "T" "A" "A" "G" "G"
str(list1)
List of 3
$ : chr [1:8] "A" "T" "C" "G" ...
$ : chr [1:8] "T" "C" "C" "T" ...
$ : chr [1:8] "C" "G" "T" "T" ...
How to convert this list into a dataframe or is there a simpler way?
edit:
I could go with:
df$pos1 <- sapply(list1, function(x) x[1])
df$pos2 <- sapply(list1, function(x) x[2])
but i guess there are better solutions.
Upvotes: 2
Views: 59
Reputation: 887741
We may use a regex method to insert a delimiter and then read with read.csv
read.csv(text = gsub("(?<=.)(?=.)", ",", df$sequence, perl = TRUE),
header = FALSE, colClasses = "character")
V1 V2 V3 V4 V5 V6 V7 V8
1 A C A A C C C A
2 G T A G T C C C
3 C T G G G C G A
Upvotes: 1
Reputation: 61214
Using R base:
> data.frame(do.call(rbind, strsplit(df$sequence, "")))
X1 X2 X3 X4 X5 X6 X7 X8
1 T A A T C A A A
2 T T A A A T G G
3 C G A A T C C T
Upvotes: 1