cianius
cianius

Reputation: 2412

How do I convert this for loop into something cooler like by in R

uniq <- unique(file[,12])
pdf("SKAT.pdf")
for(i in 1:length(uniq)) {
    dat <- subset(file, file[,12] == uniq[i])
    names <- paste("Sample_filtered_on_", uniq[i], sep="")
    qq.chisq(-2*log(as.numeric(dat[,10])), df = 2, main = names, pvals = T,
        sub=subtitle)
}
dev.off()

file[,12] is an integer so I convert it to a factor when I'm trying to run it with by instead of a for loop as follows:

pdf("SKAT.pdf")
by(file, as.factor(file[,12]), function(x) { qq.chisq(-2*log(as.numeric(x[,10])), df = 2, main = paste("Sample_filtered_on_", file[1,12], sep=""),  pvals = T, sub=subtitle) } ) 
dev.off()

It works fine to sort the data frame by this (now a factor) column. My problem is that for the plot title, I want to label it with the correct index from that column. This is easy to do in the for loop by uniq[i]. How do I do this in a by function?

Hope this makes sense.

Upvotes: 1

Views: 85

Answers (1)

Martin Morgan
Martin Morgan

Reputation: 46876

A more vectorized (== cooler?) version would pull the common operations out of the loop and let R do the book-keeping about unique factor levels.

dat <- split(-2 * log(as.numeric(file[,10])), file[,12])
names(dat) <- paste0("IoOPanos_filtered_on_pc_", names(dat))

(paste0 is a convenience function for the common use case where normally one would use paste with the argument sep=""). The for loop is entirely appropriate when you're running it for its side effects (plotting pretty pictures) rather than trying to capture values for further computation; it's definitely un-cool to use T instead of TRUE, while seq_along(dat) means that your code won't produce unexpected results when length(dat) == 0.

pdf("SKAT.pdf")
for(i in seq_along(dat)) {
    vals <- dat[[i]]
    nm <- names(dat)[[i]]
    qq.chisq(val, main = nm, df = 2, pvals = TRUE, sub=subtitle)
}
dev.off()

If you did want to capture values, the basic observation is that your function takes 2 arguments that vary. So by or tapply or sapply or ... are not appropriate; each of these assume that just a single argument is varying. Instead, use mapply or the comparable Map

Map(qq.chisq, dat, main=names(dat),
    MoreArgs=list(df=2, pvals=TRUE, sub=subtitle))

Upvotes: 2

Related Questions