Ale
Ale

Reputation: 1004

Nonlinear regression with sampling weights (package survey)

I would like to estimate the coefficients of a nonlinear model with a binary dependent variable. The nonlinearity arises because two regressors, A and B, depend on a subset of the dataset and on the two parameters lambda1 and lambda2 respectively:

y = alpha + beta1 * A(lambda1) + beta2 * B(lambda2) + delta * X + epsilon

where for each observation i, we have

enter image description here

Where a and Rs are variables in the data.frame. The regressor B(lambda2) is defined in a similar way.

Moreover, I need to include what in Stata are known as pweights, i.e. survey weights or sampling weights. For this reason, I'm working with the R package survey by Thomas Lumley.

First, I create a function for A (and B), i.e.:

A <- function(l1){
  R <- as.matrix(data[,1:(80)])
  a <- data[,169]
  N = length(a)
  var <- numeric(N)
  for (i in 1:N) {
   ai <- rep(a[i],a[i]-1) # vector of a(i)
   k <- 1:(a[i]-1) # numbers from 1 to a(i)-1
   num <- (ai-k)^l1
   den <- sum((ai-k)^l1)
   w <- num/den
   w <- c(w,rep(0,dim(R)[2]-length(w)))
   var[i] <- R[i,] %*% w
  }
  return(var)
}

B <- function(l2){
 C <- as.matrix(data[,82:(161-1)])
 a <- data[,169]
 N = length(a)
 var <- numeric(N)
 for (i in 1:N) {
  ai <- rep(a[i],a[i]-1) # vector of a(i)
  k <- 1:(a[i]-1) # numbers from 1 to a(i)-1
  num <- (ai-k)^l2
  den <- sum((ai-k)^l2)
  w <- num/den
  w <- c(w,rep(0,dim(C)[2]-length(w)))
  var[i] <- C[i,] %*% w
 }
 return(var)
}

But the problem is that I don't know how to include the nonlinear regressors in the model (or in the survey design, using the function svydesign):

d_test <- svydesign(id=~1, data = data, weights = ~data$hw0010)

Because, when I try to estimate the model:

# loglikelihood function:
LLsvy <- function(y, model, lambda1, lambda2){
  aux1 <- y * log(pnorm(model))
  aux2 <- (1-y) * log(1-pnorm(model))
  LL <- (aux1) + (aux2)
  return(LL)
}

fit <- svymle(loglike=LLsvy, 
              formulas=list(~y, model = ~ A(lambda1)+B(lambda2)+X,lambda1=~1,lambda2=~1),
              design=d_test, 
              start=list(c(0,0,0,0),c(lambda1=11),c(lambda2=8)),
              na.action="na.exclude")

I get the error message:

Error in eval(expr, envir, enclos) : object 'lambda1' not found

I think that the problem is in including the nonlinear part, because everything works fine if I fix A and B for some lambda1 and lambda2 (so that the model becomes linear):

lambda1=11
lambda2=8
data$A <- A(lambda1)
data$B <- B(lambda2)
d_test <- svydesign(id=~1, data = data, weights = ~data$hw0010)

LLsvylin <- function(y, model){
  aux1 <- y * log(pnorm(model))
  aux2 <- (1-y) * log(1-pnorm(model))
  LL <- (aux1) + (aux2)
  return(LL)
}

fitlin <- svymle(loglike=LLsvylin,
                 formulas=list(~y, model = ~A+B+X),
                 design=d_test,
                 start=list(0,0,0,0),
                 na.action="na.exclude")

On the contrary, if I don't use the sampling weights, I can easily estimate my nonlinear model using the function mle from package stats4 or the function mle2 from package bbmle.

To sum up, how can I combine sampling weights (svymle) while estimating a nonlinear model (which I can do using mle or mle2)?

=========================================================================

A problem with the nonlinear part of the model arises also when using the function svyglm (with fixed lambda1 and lambda2, in order to get good starting values for svymle):

lambda1=11
lambda2=8
model0 = y ~ A(lambda1) + B(lambda2) + X

probit1 = svyglm(formula = model0,
                     data = data,
                     family = binomial(link=probit),
                     design = d_test)

Because I get the error message:

Error in svyglm.survey.design(formula = model0, data = data, family = binomial(link = probit),  : 
  all variables must be in design= argument

Upvotes: 4

Views: 628

Answers (2)

Thomas Lumley
Thomas Lumley

Reputation: 31

The upcoming version 4 of the survey package will have a function svynls, so if you know how to fit your model without sampling weights using nls you will be able to fit it with sampling weights.

Upvotes: 1

Thomas Lumley
Thomas Lumley

Reputation: 106

This isn't what svymle does -- it's for generalised linear models, which have linear predictors and a potentially complicated likelihood or loss function. You want non-linear weighted least squares, with a simple loss function but complicated predictors.

There isn't an implementation of design-weighted nonlinear least squares in the survey package, probably because no-one has previously asked for one. You could try emailing the package author.

Upvotes: 1

Related Questions