user46543
user46543

Reputation: 1053

Split one column to two columns in R with looping

Actually I have same problem with this case strsplit one column with exact information into two column

That question already solved, just my data is just looks like

      SNP Geno AlleleA AlleleB AlleleC AlleleD AlleleE
1 marker1   G1      AA      AA      AA      AA      AA
2 marker2   G1      TT      TT      TT      TT      TT
3 marker3   G1      TT      TT      TT      TT      TT
4 marker1   G2      CC      CC      CC      CC      CC
5 marker2   G2      AA      AA      AA      AA      AA
6 marker3   G2      TT      TT      TT      TT      TT
7 marker1   G3      GG      GG      GG      GG      GG
8 marker2   G3      AA      AA      AA      AA      AA
9 marker3   G3      TT      TT      TT      TT      TT

dput output:

structure(list(SNP = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L), .Label = c("marker1", "marker2", "marker3"), class = "factor"), 
    Geno = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("G1", 
    "G2", "G3"), class = "factor"), AlleleA = structure(c(1L, 
    4L, 4L, 2L, 1L, 4L, 3L, 1L, 4L), .Label = c("AA", "CC", "GG", 
    "TT"), class = "factor"), AlleleB = structure(c(1L, 4L, 4L, 
    2L, 1L, 4L, 3L, 1L, 4L), class = "factor", .Label = c("AA", 
    "CC", "GG", "TT")), AlleleC = structure(c(1L, 4L, 4L, 2L, 
    1L, 4L, 3L, 1L, 4L), class = "factor", .Label = c("AA", "CC", 
    "GG", "TT")), AlleleD = structure(c(1L, 4L, 4L, 2L, 1L, 4L, 
    3L, 1L, 4L), class = "factor", .Label = c("AA", "CC", "GG", 
    "TT")), AlleleE = structure(c(1L, 4L, 4L, 2L, 1L, 4L, 3L, 
    1L, 4L), class = "factor", .Label = c("AA", "CC", "GG", "TT"
    ))), .Names = c("SNP", "Geno", "AlleleA", "AlleleB", "AlleleC", 
"AlleleD", "AlleleE"), row.names = c(NA, -9L), class = "data.frame")

On that question he just has one columns that want to split to two columns. The problem is I have 5000 columns (AlleleA, AlleleB.........etc) that want to split (each one column to two columns)

I've tried to use looping like this but it doesnt work,

for(i in colnames(dat)){
  dat1 <- data.frame(do.call(rbind, strsplit(as.vector(sprintf("dat$%s",i)), split = "")))
}

I will wait your light, thank you

Upvotes: 2

Views: 1084

Answers (3)

akrun
akrun

Reputation: 887138

Another option is

library(qdap)
res <- colsplit2df(dat, splitcols=2:ncol(dat),sep='')
colnames(res)[-1] <- make.names(rep(colnames(dat)[-1],each=2), unique=TRUE)
res[1:3,1:5]
#      SNP Geno Geno.1 AlleleA AlleleA.1
#1 marker1    G      1       A         A
#2 marker2    G      1       T         T
#3 marker3    G      1       T         T

Or only for Allele columns

colsplit2df(dat, splitcols=grep('Allele', names(dat)),sep='')

Edit (Tyler Rinker)

May I suggest editing the column names of the data.frame using setNames first as follows:

setNames(dat, gsub("([A-Z]{1}[a-z]+[A-Z])", "\\1.1&\\1.2", names(dat))) %>%
    colsplit2df(splitcols=3:ncol(dat), sep='')

Upvotes: 3

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

You can use cSplit from my "splitstackshape" package with the argument stripWhite = FALSE.

For example, if we wanted to split all the "Allele*" columns, we would do:

library(splitstackshape)
cSplit(mydf, grep("Allele", names(mydf)), "", stripWhite = FALSE)
#        SNP Geno AlleleA_1 AlleleA_2 AlleleB_1 AlleleB_2 AlleleC_1
# 1: marker1   G1         A         A         A         A         A
# 2: marker2   G1         T         T         T         T         T
# 3: marker3   G1         T         T         T         T         T
# 4: marker1   G2         C         C         C         C         C
# 5: marker2   G2         A         A         A         A         A
# 6: marker3   G2         T         T         T         T         T
# 7: marker1   G3         G         G         G         G         G
# 8: marker2   G3         A         A         A         A         A
# 9: marker3   G3         T         T         T         T         T
#    AlleleC_2 AlleleD_1 AlleleD_2 AlleleE_1 AlleleE_2
# 1:         A         A         A         A         A
# 2:         T         T         T         T         T
# 3:         T         T         T         T         T
# 4:         C         C         C         C         C
# 5:         A         A         A         A         A
# 6:         T         T         T         T         T
# 7:         G         G         G         G         G
# 8:         A         A         A         A         A
# 9:         T         T         T         T         T

Upvotes: 4

Davide Passaretti
Davide Passaretti

Reputation: 2771

As @beginneR says, you can use tidyr::separate. Here is an example taken from:http://blog.rstudio.org/2014/07/22/introducing-tidyr/

head(tidier, 8)

#>   id       trt     key    time
#> 1  1 treatment work.T1 0.08514
#> 2  2   control work.T1 0.22544
#> 3  3 treatment work.T1 0.27453
#> 4  4   control work.T1 0.27231
#> 5  1 treatment home.T1 0.61583
#> 6  2   control home.T1 0.42967
#> 7  3 treatment home.T1 0.65166
#> 8  4   control home.T1 0.56774

tidy <- tidier %>%
  separate(key, into = c("location", "time"), sep = "\\.") 
tidy %>% head(8)
#>   id       trt location time    time
#> 1  1 treatment     work   T1 0.08514
#> 2  2   control     work   T1 0.22544
#> 3  3 treatment     work   T1 0.27453
#> 4  4   control     work   T1 0.27231
#> 5  1 treatment     home   T1 0.61583
#> 6  2   control     home   T1 0.42967
#> 7  3 treatment     home   T1 0.65166
#> 8  4   control     home   T1 0.56774

Upvotes: 2

Related Questions