Daniel2805
Daniel2805

Reputation: 77

Two-level modelling with lme in R

I am interested in estimating a mixed effect model with two random components (I am sorry for the somewhat unprecise notation. I am somewhat new to these kind of models). Finally, I also want also the standard errors of the variances of the random components. That is why I am somewhat boudn to using the package lme. The reason is that I found this description on how to calculate those standard errors and also interesting, the standard error for function of these variances link.

I believe I know how to use the package lmer. I am finally interested in model2. For the model1, both command yield the same estimates. But model2 with lme yields different results than model2 with lmer from the lme4 package. Could you help me to get around how to set up the random components for lme? This would be much appreciated. Thanks. Please find attached my MWE.

Best

Daniel

#### load all packages #####

loadpackage <- function(x){
  for( i in x ){
    #  require returns TRUE invisibly if it was able to load package
    if( ! require( i , character.only = TRUE ) ){
      #  If package was not able to be loaded then re-install
      install.packages( i , dependencies = TRUE )
    }
    #  Load package (after installing)
    library( i , character.only = TRUE )
  }
}

#  Then try/install packages...

loadpackage( c("nlme", "msm", "lmeInfo", "lme4"))
alcohol1 <- read.table("https://stats.idre.ucla.edu/stat/r/examples/alda/data/alcohol1_pp.txt", header=T, sep=",")
attach(alcohol1)

id <- as.factor(id)
age <- as.factor(age)


model1.lmer <-lmer(alcuse ~ 1  + peer + (1|id))
summary(model1.lmer)
model2.lmer <-lmer(alcuse ~ 1  + peer + (1|id) + (1|age))
summary(model2.lmer)

model1.lme <- lme(alcuse ~ 1+ peer, data = alcohol1, random = ~ 1 |id, method ="REML")
summary(model1.lme)
model2.lme <- lme(alcuse ~ 1+ peer, data = alcohol1, random = ~ 1 |id + 1|age, method ="REML")

Edit (15/09/2021):

Estimating the model as follows end then returning the estimates via nlme::VarCorr gives me different results. While the estimates seem to be in the ball park, it is as they are switched across components.

model2a.lme <- lme(alcuse ~ 1+ peer, data = alcohol1, random = ~ 1 |id/age, method ="REML")
summary(model2a.lme)

nlme::VarCorr(model2a.lme)
            Variance     StdDev   
id =        pdLogChol(1)          
(Intercept) 0.38390274   0.6195989
age =       pdLogChol(1)          
(Intercept) 0.47892113   0.6920413
Residual    0.08282585   0.2877948

EDIT (16/09/2021):

Since Bob pushed me to think more about my model, I want to give some additional information. Please know that the data I use in the MWE do not match my true data. I just used it for illustrative purposes since I can not upload myy true data. I have a household panel with income, demographic informations and parent indicators.

I am interested in intergenerational mobility. Sibling correlations of permanent income are one industry standard. At the very least, contemporanous observations are very bad proxies of permanent income. Due to transitory shocks, i.e., classical measurement error, those estimates are most certainly attenuated. For this reason, we exploit the longitudinal dimension of our data.

For sibling correlations, this amounts to hypothesising that the income process is as follows:

$$Y_{ijt} = \beta X_{ijt} + \epsilon_{ijt}.$$

With Y being income from individual i from family j in year t. X comprises age and survey year indicators to account for life-cycle effects and macroeconmic conditions in survey years. Epsilon is a compund term comprising a random individual and family component as well as a transitory component (measurement error and short lived shocks). It looks as follows:

$$\epsilon_{ijt} = \alpha_i + \gamma_j + \eta_{ijt}.$$

The variance of income is then:

$$\sigma^2_\epsilon = \sigma^2_\alpha + \sigma^2\gamma + \sigma^2\eta.$$

The quantitiy we are interested in is

$$\rho = \frac(\sigma^2\gamma}{\sigma^2_\alpha + \sigma^2\gamma},$$

which reflects the share of shared family (and other characteristics) among siblings of the variation in permanent income.

B.t.w.: The struggle is simply because I want to have a standard errors for all estimates and for \rho.

Upvotes: 1

Views: 842

Answers (3)

Daniel2805
Daniel2805

Reputation: 77

Okay, finally. Just to sketch my confidential data: I have a panel of individuals. The data includes siblings, identified via mnr. income is earnings, wavey survey year, age age factors. female a factor for gender, pid is the factor identifying the individual.

m1 <- lmer(income ~ age + wavey + female + (1|pid) + (1 | mnr),
        data = panel)
vv <- vcov(m1, full = TRUE)

covvar <- vv[58:60, 58:60]
covvar
3 x 3 Matrix of class "dgeMatrix"
     cov_pid.(Intercept) cov_mnr.(Intercept)   residual
[1,]           2.6528679          -1.4624588 -0.4077576
[2,]          -1.4624588           3.1015001 -0.0597926
[3,]          -0.4077576          -0.0597926  1.1634680

mean <- as.data.frame(VarCorr(m1))$vcov
mean
[1] 17.92341 16.86084 56.77185
deltamethod(~ x2/(x1+x2), mean, covvar, ses =TRUE)
[1] 0.04242089

The last scalar should be what I interprete as the shared background of the siblings of permanent income.

Thanks to @Ben Bolker who pointed me into this direction.

Upvotes: 0

Ben Bolker
Ben Bolker

Reputation: 226162

This is an example of crossed vs nested random effects. (Note that the example you refer to is fitting a different kind of model, a random-slopes model rather than a model with two different grouping variables ...)

If you try with(alcohol1, table(age,id)) you can see that every id is associated with every possible age (14, 15, 16). Or subset(alcohol1, id==1) for example:

  id age coa male age_14   alcuse     peer     cpeer  ccoa
1  1  14   1    0      0 1.732051 1.264911 0.2469111 0.549
2  1  15   1    0      1 2.000000 1.264911 0.2469111 0.549
3  1  16   1    0      2 2.000000 1.264911 0.2469111 0.549

There are three possible models you could fit for a model with random effects of age(indexed by i) and id (indexed by j)

  • crossed ((1|age) + (1|id)): Y_{ij} = beta0 + beta1*peer + eps1_i + eps2_j +epsr_{ij}; alcohol use varies among individuals and, independently, across ages (this model won't work very well because there are only three distinct ages in the data set, more levels are usually needed)
  • id nested within age ((1|age/id) = (1|age) + (1|age:id)):
    Y_{ij} = beta0 + beta1*peer + eps1_i + eps2_{ij} + epsr_{ij}; alcohol use varies across ages, and varies across individuals within ages (see note above about number of levels).
  • age nested within id ((1|id/age) = (1|id) + (1|age:id)):
    Y_{ij} = beta0 + beta1*peer + eps1_j + eps2_{ij} + epsr_{ij}; alcohol use varies across individuals, and varies across ages within individuals

Here eps1_i, eps2_{ij}, and epsr_{ij} are normal deviates; epsr is the residual error term.

The latter two models actually don't make sense in this case; because there is only one observation per age/id combination, the nested variance (eps2) is completely confounded with the residual variance (epsr). lme doesn't notice this; if you try to fit one of the nested models in lmer it will give an error that

number of levels of each grouping factor must be < number of observations (problems: id:age)

(Although if you try to compute confidence intervals based on model1.lme you'll get an error "cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance", which is a hint that something is wrong.)

You could restate this problem as saying that the residual variation, and the variation among ages within individuals, are jointly unidentifiable (can't be separated from each other, statistically).

The updated answer here shows how to get the standard errors of the variance components from an lmer model, so you shouldn't be stuck with lme (but you should think carefully about which model you're really trying to fit ...)

The GLMM FAQ might also be useful.

More generally, the standard error of

rho = (V_gamma)/(V_alpha + V_gamma)

will be hard to compute accurately, because this is a nonlinear function of the model parameters. You can apply the delta method, but the most reliable approach would be to use parametric bootstrapping: if you have a fitted model m, then something like this should work:

var_ratio <- function(m) {
   v <- as.data.frame(sapply(VarCorr(m), as.numeric))
   return(v$family/(v$family + v$id))
}
confint(m, method="boot", FUN =var_ratio)

Upvotes: 2

Kra.P
Kra.P

Reputation: 15123

You should specify random effects in lme by using / not +

By lmer

model2.lmer <-lmer(alcuse ~ 1  + peer + (1|id) + (1|age), data = alcohol1)
summary(model2.lmer)

Linear mixed model fit by REML ['lmerMod']
Formula: alcuse ~ 1 + peer + (1 | id) + (1 | age)
   Data: alcohol1

REML criterion at convergence: 651.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.0228 -0.5310 -0.1329  0.5854  3.1545 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 0.08078  0.2842  
 age      (Intercept) 0.30313  0.5506  
 Residual             0.56175  0.7495  
Number of obs: 246, groups:  id, 82; age, 82

Fixed effects:
            Estimate Std. Error t value
(Intercept)   0.3039     0.1438   2.113
peer          0.6074     0.1151   5.276

Correlation of Fixed Effects:
     (Intr)
peer -0.814

By lme

model2.lme <- lme(alcuse ~ 1+ peer, data = alcohol1, random = ~ 1 |id/age, method ="REML")
summary(model2.lme)

Linear mixed-effects model fit by REML
 Data: alcohol1 
       AIC      BIC    logLik
  661.3109 678.7967 -325.6554

Random effects:
 Formula: ~1 | id
        (Intercept)
StdDev:   0.4381206

 Formula: ~1 | age %in% id
        (Intercept)  Residual
StdDev:   0.4381203 0.7494988

Fixed effects: alcuse ~ 1 + peer 
                Value Std.Error  DF  t-value p-value
(Intercept) 0.3038946 0.1438333 164 2.112825  0.0361
peer        0.6073948 0.1151228  80 5.276060  0.0000
 Correlation: 
     (Intr)
peer -0.814

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.0227793 -0.5309669 -0.1329302  0.5853768  3.1544873 

Number of Observations: 246
Number of Groups: 
         id age %in% id 
         82          82 

Upvotes: 0

Related Questions