chris wills
chris wills

Reputation: 23

How to speed up for loop/append to vector in R

I have ~11,000,000 rows in a data frame, i need to loop through each, do a small calculation and then retrieve the corresponding p-value from a chi-squared distribution using pchisq(). Every time this value is retrieved it is appended to an empty vector which is later on added to the data frame.

This code is very inefficient and took exactly a week to run on the server, i believe that is due to the append() function having to copy the whole vector every time. How can i make this as efficient as possible?

Here is the current loop:

std_err <- NULL
for (i in 1:nrow(father)){
  std_err <- append(std_err, pchisq((mother[i,7]-father[i,7])^2/((mother[i,8])^2 + (father[i,8])^2), df=1, lower.tail = F))
}


father[ ,"p_std_err"] <- std_err
write.table(father, "father+standard_error.sumstats", sep = '\t', col.names = T, row.names = F, quote = F)

Upvotes: 0

Views: 442

Answers (1)

Mikko Marttila
Mikko Marttila

Reputation: 11878

pchisq() is vectorized, so you don't need a loop at all. You can just write:

pchisq((mother[, 7] - father[, 7])^2 / (mother[, 8]^2 + father[, 8]^2), df = 1, lower.tail = FALSE)

Upvotes: 6

Related Questions