Reputation: 97
I'm building a function with multiple steps, where an object is created at each step. A certain step fails (temp3) and cannot find the previous steps object (Error: object 'temp2' not found). I'm not sure why - I have similar functions that follow EXACTLY the same structure, each step following on the previously created object, within the function, which runs fine. When you run that code outside of the function it works (so the code seems fine), and using debug() the step that is supposedly not creating the data (temp2) is actually storing is into the local memory (so I can see the object “temp2”), but for some reason R does not seem to recognize it or use it. I'm stumped! Maybe I'm just not getting how R is evaluating steps and recalling objects within the local memory? Have I gone about writing the function in the wrong way?
I can easily prepare a worked example if it would be of more use since this function recalls odd packages etc, but at the moment I think its more an issue of how I misunderstand how R assigns objects to the local memory within a function. A similar query is here, How does R handle object in function call?, but indeed I am assigning each new object within the function. Could you please help?
glm.random<-function(df){
reps=5
output<-matrix(NA, ncol=1, nrow=0)
while (length(output[,1])<reps) {
temp1 <- ddply(df,.(study_id),randomRows,1)
temp2 <- subset(temp1,select = c(continent,taxatype, metric,nullm, yi_pos))
temp3 <- glmulti(yi_pos ~ ., data = temp2, family = gaussian( link = log), crit = aic, plotty = F, report = F)
temp4 <- noquote(paste(summary(temp3)$bestmodel[1]))
output<-rbind(output,temp4)
}
write.table(output, "output.glm.random1.txt", append=TRUE, sep="\t", quote=FALSE)
}
In Reply:
Hi again,
Andrie – 1). So I delete the use of subset (but curious here, what ‘unexpected results’ do you refer to?). 2). I have been finding it difficult with the data at hand, but I see your point and need to improve my coding approach here 3). Good tip! But here its done just to check that thing are working – I would likely just use that output object for more analysis.
Gavin 1) Will do! 2+3) So seems the error lies in creating (or recalling) ‘temp1’.
Below what I hope is some reproducible code. If it helps, the approach I’m trying to duplicate is found in Gibson et al. 2011 Nature 478:378. (See Detailed Methods “Generalized linear models.”).
Thank you!
#rm(list = ls())
library("plyr")
library("glmulti")
# random rows function
randomRows = function(df,n){
return(df[sample(nrow(df),n),])
}
# Dataframe example
study_id <- c(1,1,1,1,2,2,3,3,3,4)
continent <- c("AF","AF","AF","AF","AF","AF", "AS", "AS", "AS", "SA")
taxatype <- c("bird","bird","bird","mam","mam","arthro", "arthro", "arthro", "arthro", "arthro")
metric<- c("sppr","sppr","sppr","sppr","abund","abund", "abund", "abund", "abund", "abund")
extra.data<- c(34:43)
yi_pos<- runif(1:10)
df<- data.frame(study_id=study_id, continent=continent,metric=metric, taxatype=taxatype,extra.data = extra.data, yi_pos = yi_pos)
df
# Function. Goal:repeat x10000 (but here reps =5) ( Select one random value per study_id, run glmulti{glmulti}, select best ranked model, concatenate to an output and export).
glm.random<-function(df){
reps=5
output<-matrix(NA, ncol=1, nrow=0)
while (length(output[,1])<reps) {
temp1 <- ddply(df,.(study_id),randomRows,1)
temp3 <- glmulti(yi_pos ~ continent+taxatype+metric, data = temp1, family = gaussian( link = log), crit = aic, plotty = F, report = F)
temp4 <- noquote(paste(summary(temp3)$bestmodel[1]))
output<-rbind(output,temp4)
}
write.table(output, "output.glm.random1.txt", append=TRUE, sep="\t", quote=FALSE)
}
# run function to obtain error
glm.random(df)
# debug(glm.random)
# glm.random(df)
# undebug(glm.random)
Upvotes: 3
Views: 653
Reputation: 19454
From ?glmulti
,
If [the argument
data
is] not specified, glmulti will try to find the data in the environment of the formula, from the fitted model passed as y argument, or from the global environment.
However, when you specify data = temp1
, glmulti
apparently looks in the global environment for this object. Therefore, you may need to assign your randomly selected data to the global environment (I've renamed things a little to try and keep names and objects in check):
glm.random2<-function(df){
reps=5
output<-matrix(NA, ncol=1, nrow=0)
while (length(output[,1])<reps) {
## Here things are different
temp2 <- ddply(df,.(study_id),randomRows,1)
names(temp2)[2]<-"cOntinent"
assign("temp1",temp2,envir=.GlobalEnv)
## Note the slightly modified formula, to check whether
## gmulti looks for terms in temp1 or simply as named objects in the environment
## It looks like the former, which is good.
temp3 <- glmulti(yi_pos ~ cOntinent+taxatype+metric, data = temp1,
family = gaussian( link = log), crit = aic, plotty = F, report = F)
temp4 <- noquote(paste(summary(temp3)$bestmodel[1]))
output<-rbind(output,temp4)
## Remove the object temp1 from the global environment
rm(temp1,envir=.GlobalEnv)
}
write.table(output, "output.glm.random1.txt", append=TRUE, sep="\t", quote=FALSE)
}
# run function - no error for me!
glm.random2(df)
You might want to check with the package maintainer to see if this is the intended way for glmulti
to work.
Upvotes: 2