nouse
nouse

Reputation: 3471

splitting character strings to a dataframe with one column for each character

In the following dataset:

df <- data.frame(barcode=c("B1","B2", "B3"), 
                 sequence= sapply(1:3, function(x) paste(sample(c("A","C","T","G"), 8, replace=T), collapse=""))

i want to split df$sequence into 8 additional columns containing each one of the character strings in the correct order.

I know how to split the character vectors, but this ends up in a list:

library(stringr)
list1 <- str_extract_all(df$sequence,boundary("character"))
[[1]]
[1] "A" "T" "C" "G" "T" "G" "A" "A"

[[2]]
[1] "T" "C" "C" "T" "A" "T" "A" "T"

[[3]]
[1] "C" "G" "T" "T" "A" "A" "G" "G"

str(list1)
List of 3
 $ : chr [1:8] "A" "T" "C" "G" ...
 $ : chr [1:8] "T" "C" "C" "T" ...
 $ : chr [1:8] "C" "G" "T" "T" ...

How to convert this list into a dataframe or is there a simpler way?

edit:

I could go with:

df$pos1 <- sapply(list1, function(x) x[1])
df$pos2 <- sapply(list1, function(x) x[2])

but i guess there are better solutions.

Upvotes: 2

Views: 59

Answers (2)

akrun
akrun

Reputation: 887741

We may use a regex method to insert a delimiter and then read with read.csv

read.csv(text = gsub("(?<=.)(?=.)", ",", df$sequence, perl = TRUE), 
       header = FALSE, colClasses = "character")
  V1 V2 V3 V4 V5 V6 V7 V8
1  A  C  A  A  C  C  C  A
2  G  T  A  G  T  C  C  C
3  C  T  G  G  G  C  G  A

Upvotes: 1

Jilber Urbina
Jilber Urbina

Reputation: 61214

Using R base:

> data.frame(do.call(rbind, strsplit(df$sequence, "")))
  X1 X2 X3 X4 X5 X6 X7 X8
1  T  A  A  T  C  A  A  A
2  T  T  A  A  A  T  G  G
3  C  G  A  A  T  C  C  T

Upvotes: 1

Related Questions