Reputation: 799
What I am trying to do is to make a non-linear regression using possible submodels of my full model and then choose the most apropriate model using AIC criterion. The problem is to generate all possible submodels and then apply them to nls
function to find the best fit.
Let's say I have a data:
x <- rnorm(100)
y <- 1+x+x^2-x^3-x^4+rnorm(100, sd=0.1)
And the full formula as function of the variable x
and some parameters a
, b
, c
, d
, e
:
full <- function(x, a, b, c, d, e){
return(a + b*x + c*x^2 + d*x^3 + e*x^4)
}
(I know it is a silly example of non-linear model and I could use data transformation + linear model for it, but I want it to be simple)
I want to generate all possible submodels, skipping some parameters. I tried to just set those skipped parameters to zero:
skip <- function(args){
# args = subset of c("a", "b", "c", "d", "e")
return (function(x, a=0, b=0, c=0, d=0, e=0) {
par <- c("a", "b", "c", "d", "e")
parameters <- lapply(par, function(p){
if(p %in% args){
return (0)
}
else{
return (get(p))
}
})
names(parameters) <- c("a", "b", "c", "d", "e")
return (with(parameters, a + b*x + c*x^2 + d*x^3 + e*x^4))
})
}
And I write a function to apply those formulas to nls
:
apply_nls <- function(func, start){
fit <- nls(y~func(x, a, b, c, d, e),
start=start)
return(fit)
}
The problem is that it does not work. If I do specify the starting value for ommited parameters:
apply_nls(skip("e"), start=list(a=1, b=1, c=1, d=-1, e=-1))
then I got an error message
singular gradient matrix at initial parameter estimates
(because indeed, my function do not depend on e
parameter).
But when I do not specify starting values for b
and d
(I should be able to do it, because I specified the default vales of those parameters inside skip
):
apply_nls(skip("e"), start=list(a=1, b=1, c=1, d=-1))
Then I got another error message:
parameters without starting value in 'data': e
I suppose I should restrict the parameters in skip
and/or in apply_nls
functions so they take only the parameters I need at that time, like:
apply_nls <- function(func, args, start){
fit <- nls(y~func(x, args),
start=start)
return(fit)
}
But it does not work and I do not know how to properly implement it.
Upvotes: 1
Views: 169
Reputation: 799
If anyone is interested, I've solved this problem.
The function apply_nls
works when it is in the form:
apply_nls <- function(func, par, start){
fit <- nls(y~do.call(func, args=append(list(x=x), mget(par))),
start=start)
return(fit)
}
Here mget
returns value of each parameter given the parameter name (as a string) and do.call
enable to feed the func
with resulting arguments.
This func
is a function (sub-formula) after skipping some parameters, par
are the remaining parameters and start
are starting values for those parameters. So application of apply_nls
looks like this:
apply_nls(skip("e"), par=c("a", "b", "c", "d"), start=list(a=1, b=1, c=1, d=-1))
To get all fits for submodels:
1) I assign parameters names and start values for all of them
parameters <- c("a", "b", "c", "d", "e")
start <- list(a=1, b=1, c=1, d=-1, e=-1)
2) I make a list of all combination of dropped parameters
drops <- append(NA, c(parameters,
combn(parameters, 2, simplify=F),
combn(parameters, 3, simplify=F),
combn(parameters, 4, simplify=F)))
3) I write two functions that would return remaining parameters or start values given the parameters to drop:
choose_starts <- function(args, start){
return(start[!(names(start) %in% args)])
}
choose_pars <- function(args, all_pars){
return(all_pars[!all_pars %in% args])
}
4) I create all combinations of formulas, parameters and starting values given the skipped parameters:
all_formulas <- lapply(drops, skip)
all_starts <- lapply(drops, choose_starts, start)
all_pars <- lapply(drops, choose_pars, parameters)
5) I fit the nonlinear models for all the above.
all_fits <- mapply(apply_nls, all_formulas, all_pars, all_starts, SIMPLIFY=F)
Upvotes: 1