cooldood3490
cooldood3490

Reputation: 2488

How can I speed up my R code?

I'm looping through a large character matrix called geno.data. I'm essentially looping two columns of the matrix at a time. It's running very slowly. At the end of the loop, I plan to export EIGENSTRAT_genofile to a .txt file. I use Xa in the rest of the program. How can I make it faster? Thanks

geno.data <- matrix(nrow = 313, ncol = 300000, c("a","c","g")) # large matrix
Num_of_SNPs<-ncol(geno.data) /2

compute_MAF<- function(j){
    loci<- NULL
    loci<- table(as.vector(geno.data[,c(2*j-1,2*j)]))
    total_alleles<- sum(loci)
    loci<- loci/total_alleles

    # major and minor allele frequencies for one locus
    major_allele<- names(which.max(loci))
    minor_allele<- names(which.min(loci))

    return(cbind(major_allele, minor_allele))
}

Xa <- matrix(NA, nrow = nrow(geno.data), ncol = Num_of_SNPs)
EIGENSTRAT_genofile<-c()

for (j in 1:Num_of_SNPs){
    allele<-compute_MAF(j)
    X <- 1 * (geno.data[,c(2*j-1, 2*j)] == allele[2])  # minor allele
    reference_allele_count <- rowSums(geno.data[,c(2*j-1,2*j)]==allele[1], na.rm=TRUE)
    EIGENSTRAT_genofile<- rbind(EIGENSTRAT_genofile,reference_allele_count)
    Xa[,j] <- X[,1] + X[,2] - 1
}

Upvotes: 0

Views: 82

Answers (1)

nicola
nicola

Reputation: 24480

Although there are improvements that could be made all over your code, the biggest bottleneck is represented by:

    EIGENSTRAT_genofile<- rbind(EIGENSTRAT_genofile,reference_allele_count)

You should always avoid growing objects inside a loop. Rather, try to initialize a list before the loop:

    EIGENSTRAT_genofile<-vector("list",Num_of_SNPs)

Inside the loop, you can assign reference_allele_count to an element of EIGENSTRAT_genofile:

    EIGENSTRAT_genofile[[j]]<-reference_allele_count

Then, at the end after the loop, you rbind all the elements of the list together through do.call:

   EIGENSTRAT_genofile<-do.call(rbind, EIGENSTRAT_genofile)

Upvotes: 1

Related Questions