spencer886
spencer886

Reputation: 37

Different results of linear mixed model in R (lme) and SAS (proc mixed)

I tried to fit random coefficient model using the lme function from nlme package, the example codes are below:

model_example_1 <- lme(CFB~ BASE + Dose + ADY + Dose:ADY + BASE:ADY,
                   random = ~ ADY| SubjectID, 
                   correlation = corSymm(form = ~ 1 | SubjectID),
                   data = example_data,
                   method = "REML", na.action = na.exclude,
                   control = lmeControl(msMaxIter = 2000, opt = "optim"))

Based on this code, we tried to fit the random intercept and slope model with unstructured correlation.

Meanwhile, we also fit the model in SAS with the same data, the codes are below:

proc mixed data example_data MAXITER= 2000;
    class Dose SubjectID;
    model CFB = Dose BASE ADY BASE *ADY Dose*ADY/SOLUTION cl DDFM= BW;
                random Int ADY/type=un subject=SubjectID G GCORR VCORR;
run;

The SAS codes fit the random intercept and slope model with unstructured correlation also, however, these two models give us different results

And the tricky part is that when we remove the correlation part in the R codes (based on my understanding of the help document, there is no within-group correlation), then we obtain the same results with SAS codes with unstructured correlation.

model_example_2 <- lme(CFB~ BASE + Dose + ADY + Dose:ADY + BASE:ADY,
                   random = ~ ADY| SubjectID, 
                   #correlation = corSymm(form = ~ 1 | SubjectID),
                   data = example_data,
                   method = "REML", na.action = na.exclude,
                   control = lmeControl(msMaxIter = 2000, opt = "optim"))

Anyone have any ideas?


Edit: 2025/01/14

According to Mikko's comments, now I understand that type = un in the random statement applies to the covariance structure of the random effects, while type = un in the repeated statement applies to the within-group residual covariance.

I fitted the same model_example_1 for R, and updated the SAS codes to include the unstructured within-group correlation. (add repeated /type=un subject=SubjectID R RCORR; )

proc mixed data = example_data MAXITER= 2000;
    class Dose SubjectID;
    model CFB = Dose BASE ADY BASE*ADY Dose*ADY/SOLUTION cl DDFM= BW;
    random Int ADY/type=un subject=SubjectID; 
    repeated /type=un subject=SubjectID R RCORR;  
run;

I still get different results from R, however, when I change both within-group correlation to compound symmetry, I can get nearly the same results. Any ideas for the difference for R and SAS with unstructured within-group correlation matrix?

R results for unstructured within-group correlation: enter image description here

SAS results unstructured within-group correlation:

enter image description here

Below are the R codes for simulation:

set.seed(123)
SubjectID <- paste0("ID_", 1:100)
example_data <- data.frame(SubjectID = rep(SubjectID, each = 4), ADY_fixed = rep(c(7, 30, 60, 90), 100), ADY_error = round(runif(400, -2, 2)), 
                           BASE = rep(rnorm(100, 60, 10), each = 4), Dose = rep(c("High", "Low"), each = 200), CFB = rnorm(400, 5, 10))
example_data$ADY <- example_data$ADY_fixed + example_data$ADY_error
example_data$Dose <- factor(example_data$Dose, levels = c("Low", "High"))

Upvotes: 1

Views: 100

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226742

I think you may be misunderstanding what corSymm does; it fits an unstructured correlation matrix within groups, which is probably not what you want (and is not included in the SAS specification ...) Perhaps you're confusing this with the within-group compound symmetry (i.e., every pair of observations within a group is equally correlated) that is induced by an intercept random effect on groups (which is already in your model)?

From ?corSymm:

form: a one sided formula of the form ‘~ t’, or ‘~ t | g’, specifying a time covariate ‘t’ and, optionally, a grouping factor ‘g’ ... Defaults to ‘~ 1’, which corresponds to using the order of the observations in the data as a covariate, and no groups. [emphasis added]

The documentation isn't completely explicit here, but this implies that corSymm = ~ 1 | g will fit a random effect with covariates 1 .. n which are the within-group indices of the observations. We'll see shortly what this does.

Fit the model with and without corSymm (code to simulate the data is at the end of the question):

library(nlme)
m2 <- lme(CFB ~ BASE + Dose + ADY + Dose:ADY + BASE:ADY,
          random = ~ 1 + ADY | SubjectID,
          data = dd, method = "REML")
m2B <- update(m2, correlation = corSymm(form = ~ 1 | SubjectID))

Look at the log-likelihoods:

logLik(m2)
## 'log Lik.' -1614.338 (df=10)
 logLik(m2B)
## 'log Lik.' -1593.981 (df=55)

m2B (with correlation) has 45 more parameters — my example has 10 observations per group, so the correlation matrix is 10×10 and requires (10*9)/2 = 45 parameters — and a much larger log-likelihood (because the model is much more complex, it can fit the data more precisely).

If we look at print(m2B) or summary(m2B) we can see that the model indeed includes a full 10×10 estimated correlation matrix:

Correlation Structure: General
 Formula: ~1 | SubjectID 
 Parameter estimate(s):
 Correlation: 
   1      2      3      4      5      6      7      8      9     
2  -0.143                                                        
3   0.182  0.102                                                 
4   0.012  0.061  0.181                                          
5  -0.138 -0.055  0.060 -0.176                                   
6   0.112 -0.027  0.050 -0.101 -0.224                            
7  -0.177 -0.158 -0.127 -0.134 -0.141  0.108                     
8   0.026  0.256  0.188  0.167 -0.196  0.169  0.191              
9   0.017  0.148  0.124 -0.069  0.068 -0.136 -0.103  0.054       
10  0.009  0.096  0.117  0.112 -0.065  0.189  0.086  0.216  0.110

simulated data

set.seed(101)
dd <- data.frame(Dose = rnorm(1000), SubjectID = gl(100, 10),
                 ADY = rnorm(1000), BASE = factor(rep(1:2, 500)),
                 step = gl(10,100))

## use ::: to avoid having to load lme4 before running ...
dd$CFB <- lme4:::simulate.formula(~ BASE + Dose + ADY + Dose:ADY + BASE:ADY + (1 + ADY | SubjectID),
                   newdata = dd,
                   newparams = list(beta = rep(0, 6), theta = rep(1, 3),
                                    sigma = 1))[[1]]

Upvotes: 6

Related Questions