user3745089
user3745089

Reputation: 149

loop over levels of a factor variable in a function

I have a dataframe, dat, with a covariate site coded as a factor with 31 different levels.

cas_1_sitea_586754968 0 0 1 2 0 sitea 
con_65_sitea_568859302 1 0 2 1 1 siteb
cas_9_siteb_0799700 0 0 0 0 0 siteb 
con_siteb_THR84569 2 0 0 1 0 sitea

I have a function that works when I apply it to one site variable at a time:

get_maf <- function(data){
    allele.count <- apply(data[,1:(ncol(data)-2)],2,sum)
    maf <- allele.count/(2*nrow(data))
    out <- paste((unique(data$site)),"_jp.maf",sep="")
    write.table(maf, out, col.names=F, quote=F)
}

But, when I try to loop over the data within each of the 31 sites using lapply like this:

lapply(unique(dat$site), get_maf, data = dat)    

I get an error: lapply(unique(jp$site), get_maf_jp, data = jp) Error in FUN(c("aber", "ajsz", "asrb", "buls", "cati", "caws", "cims", : unused argument (c("aber", "ajsz", "asrb", "buls", "cati", "caws", "cims", "clo3", "cou3", "denm", "dubl", "edin", "egcu", "ersw", "gras", "irwt", "lie2", "lie5", "mgs2", "msaf", "munc", "pewb", "pews", "s234", "swe1", "swe5", "swe6", "top8", "ucla", "umeb", "umes")[[1]])

Any insights into what I am doing wrong here are greatly appreciated.

Upvotes: 0

Views: 727

Answers (1)

Marat Talipov
Marat Talipov

Reputation: 13304

The problem with the lapply(unique(dat$site), get_maf, data = dat) expression is that it tries to pass two arguments to get_maf: first comes from lapply, and the second comes from data=dat. You can fix it like that: lapply(unique(dat$site), function(s) {get_maf(data=dat[dat$site==s,]}).

Alternatively, you can use

library(dplyr)
dat %>% group_by(site) %>% get_maf

PS: if you're dealing with large data sets, consider using allele.count <- colSums(data[,1:(ncol(data)-2)]) in the get_maf function instead of much slower allele.count <- apply(data[,1:(ncol(data)-2)],2,sum) that you have now.

Upvotes: 1

Related Questions