Reputation: 149
I have a dataframe, dat, with a covariate site coded as a factor with 31 different levels.
cas_1_sitea_586754968 0 0 1 2 0 sitea
con_65_sitea_568859302 1 0 2 1 1 siteb
cas_9_siteb_0799700 0 0 0 0 0 siteb
con_siteb_THR84569 2 0 0 1 0 sitea
I have a function that works when I apply it to one site variable at a time:
get_maf <- function(data){
allele.count <- apply(data[,1:(ncol(data)-2)],2,sum)
maf <- allele.count/(2*nrow(data))
out <- paste((unique(data$site)),"_jp.maf",sep="")
write.table(maf, out, col.names=F, quote=F)
}
But, when I try to loop over the data within each of the 31 sites using lapply like this:
lapply(unique(dat$site), get_maf, data = dat)
I get an error: lapply(unique(jp$site), get_maf_jp, data = jp)
Error in FUN(c("aber", "ajsz", "asrb", "buls", "cati", "caws", "cims", :
unused argument (c("aber", "ajsz", "asrb", "buls", "cati", "caws", "cims", "clo3", "cou3", "denm", "dubl", "edin", "egcu", "ersw", "gras", "irwt", "lie2", "lie5", "mgs2", "msaf", "munc", "pewb", "pews", "s234", "swe1", "swe5", "swe6", "top8", "ucla", "umeb", "umes")[[1]])
Any insights into what I am doing wrong here are greatly appreciated.
Upvotes: 0
Views: 727
Reputation: 13304
The problem with the lapply(unique(dat$site), get_maf, data = dat)
expression is that it tries to pass two arguments to get_maf
: first comes from lapply
, and the second comes from data=dat
. You can fix it like that: lapply(unique(dat$site), function(s) {get_maf(data=dat[dat$site==s,]})
.
Alternatively, you can use
library(dplyr)
dat %>% group_by(site) %>% get_maf
PS: if you're dealing with large data sets, consider using allele.count <- colSums(data[,1:(ncol(data)-2)])
in the get_maf
function instead of much slower allele.count <- apply(data[,1:(ncol(data)-2)],2,sum)
that you have now.
Upvotes: 1