Reputation: 29
I'm trying to convert the following SAS code in R to get the same result that I get from SAS. Here is the SAS code:
DATA plants;
INPUT sample $ treatmt $ y ;
cards;
1 trt1 6.426264755
1 trt1 6.95419631
1 trt1 6.64385619
1 trt2 7.348728154
1 trt2 6.247927513
1 trt2 6.491853096
2 trt1 2.807354922
2 trt1 2.584962501
2 trt1 3.584962501
2 trt2 3.906890596
2 trt2 3
2 trt2 3.459431619
3 trt1 2
3 trt1 4.321928095
3 trt1 3.459431619
3 trt2 3.807354922
3 trt2 3
3 trt2 2.807354922
4 trt1 0
4 trt1 0
4 trt1 0
4 trt2 0
4 trt2 0
4 trt2 0
;
RUN;
PROC MIXED ASYCOV NOBOUND DATA=plants ALPHA=0.05 method=ML;
CLASS sample treatmt;
MODEL y = treatmt ;
RANDOM int treatmt/ subject=sample ;
RUN;
I get the following covariance estimates from SAS:
Intercept sample ==> 5.5795 Treatmt sample ==> -0.08455 Residual ==> 0.3181
I tried the following in R, but I get different results.
s=as.factor(sample)
lmer(y~ 1+treatmt+(1|treatmt:s),REML=FALSE)
Upvotes: 2
Views: 4989
Reputation: 84529
You are using the SAS option NOBOUND
which allows negative estimates of the variances, and you get a negative estimate. This is not possible with lmer
, which constraints the variances to be positive.
We can try to get the SAS results manually. Firstly, note that the equivalent lmer
syntax is:
lmer(y ~ 1 + treatment + (1+treatment|sample), REML=FALSE, data = dat)
Let's maximize the log-likelihood, allowing negative variances:
dattxt <- "1 trt1 6.426264755
1 trt1 6.95419631
1 trt1 6.64385619
1 trt2 7.348728154
1 trt2 6.247927513
1 trt2 6.491853096
2 trt1 2.807354922
2 trt1 2.584962501
2 trt1 3.584962501
2 trt2 3.906890596
2 trt2 3
2 trt2 3.459431619
3 trt1 2
3 trt1 4.321928095
3 trt1 3.459431619
3 trt2 3.807354922
3 trt2 3
3 trt2 2.807354922
4 trt1 0
4 trt1 0
4 trt1 0
4 trt2 0
4 trt2 0
4 trt2 0
"
dat <- read.table(text = dattxt)
names(dat) <- c("sample", "treatment", "y")
dat$sample <- as.factor(dat$sample)
opts <- options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly"))
library(lme4)
fit <- lmer(y ~ 1 + treatment + (1+treatment|sample), REML=FALSE, data = dat)
# marginal variance matrix in function of variance components
Vfun <- function(fit, vcs){
Z <- getME(fit, "Z")
n <- getME(fit, "n")
l_i <- getME(fit, "l_i")
sigma2_a <- vcs[1]
sigma2_b <- vcs[2]
sigma_ab <- vcs[3]
sigma2 <- vcs[4]
G <- matrix(c(sigma2_a, sigma_ab, sigma_ab, sigma2_b), nrow = 2)
R <- Diagonal(n, sigma2)
Z %*% bdiag(rep(list(G),l_i)) %*% t(Z) + R
}
# minus log-likelihood
library(mvtnorm)
logLHD <- function(params, fit){
X <- getME(fit, "X")
beta <- params[1:ncol(X)]
y <- getME(fit, "y")
vcs <- tail(params, length(params)-ncol(X))
V <- as.matrix(Vfun(fit, vcs))
if(any(eigen(V)$values <= 0)){
return(runif(1, 1e7, 1e8)) # return a high-value if V is not positive
}
-dmvnorm(y, c(X%*%beta), sigma = V, log = TRUE)
}
# optimization of log-likelihood
library(dfoptim)
start <-
c(fixef(fit), vc$sample[1,1], vc$sample[2,2], vc$sample[1,2], sigma(fit)^2)
names(start)[3:6] <-
c("sample.Intercept", "sample.trt1", "covariance", "sigma2")
opt <- hjkb(start, logLHD, lower=c(-Inf,-Inf,-Inf,-Inf,-Inf,0), fit=fit)
### results
opt$par
# (Intercept) treatmenttrt1 sample.Intercept sample.trt1 covariance sigma2
# 3.33912840 -0.10721533 5.50671885 -0.16909628 0.07275635 0.31812378
The residual variance is the same as the one obtained with SAS. To get the other SAS results, one has to do some gymnastic with our results, I don't understand why, but we get them in this way:
### SAS results
opt$par[["sample.Intercept"]] + opt$par[["covariance"]]
# 5.579475
opt$par[["sample.trt1"]] / 2
# -0.08454814
Note that the log-likelihood is indeed better maximized with the negative variance:
### remark: lmer achieves a lower log-likelihood
logLik(fit)
# 'log Lik.' -27.88947 (df=6)
-opt$value
# -26.43355
I would appreciate if someone could explain the required gymnastic...
Sorry, this is not the good model. The model is:
lmer(y ~ 1 + treatment + (1|sample/treatment), REML=FALSE, data = dat)
Here are the SAS results:
opts <- options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly"))
library(lme4)
fit <- lmer(y ~ 1+treatment+(1|sample/treatment), REML=FALSE, data = dat)
vc <- VarCorr(fit)
Vfun <- function(fit, vcs){
Z <- getME(fit, "Z")
n <- getME(fit, "n")
l_i <- getME(fit, "l_i")
G <- Diagonal(sum(l_i), rep(vcs[1:2], l_i))
R <- Diagonal(n, vcs[3])
Z %*% G %*% t(Z) + R
}
library(mvtnorm)
logLHD <- function(params, fit){
X <- getME(fit, "X")
beta <- params[1:ncol(X)]
y <- getME(fit, "y")
vcs <- tail(params, length(params)-ncol(X))
V <- as.matrix(Vfun(fit, vcs))
if(any(eigen(V)$values <= 0)) return(runif(1, 1e7, 1e8))
-dmvnorm(y, c(X%*%beta), sigma = V, log = TRUE)
}
library(dfoptim)
start <- c(fixef(fit), vc[[1]], vc[[2]], sigma(fit)^2)
opt <- hjkb(start, logLHD, lower=c(-Inf,-Inf,-Inf,-Inf,0), fit=fit)
opt$par[3:5]
# -0.08454877 5.57947601 0.31812697
Upvotes: 0
Reputation: 7435
I don't know if you'll be able to get the exact results from SAS to R, but I was able to get close by dealing with contrast
as outlined here :
lmer for SAS PROC MIXED Users : page 6
When comparing estimates produced by SAS PROC MIXED and by lmer one must be careful to consider the contrasts that are used to define the effects of factors. In SAS a model with an intercept and a qualitative factor is defined in terms of the intercept and the indicator variables for all but the last level of the factor. The default behaviour in S is to use the Helmert contrasts for the factor. On a balanced factor these provide a set of orthogonal contrasts. In R the default is the “treatment” contrasts which are almost the same as the SAS parameterization except that they drop the indicator of the first level, not the last level. When in doubt, check which contrasts are being used with the contrasts function. To make comparisons easier, you may find it worthwhile to declare
options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly"))
at the beginning of your session.
dput :
df <- structure(list(sample = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L),
treatmt = c("trt1", "trt1", "trt1", "trt2", "trt2", "trt2",
"trt1", "trt1", "trt1", "trt2", "trt2", "trt2", "trt1", "trt1",
"trt1", "trt2", "trt2", "trt2", "trt1", "trt1", "trt1", "trt2",
"trt2", "trt2"), y = c(6.426264755, 6.95419631, 6.64385619,
7.348728154, 6.247927513, 6.491853096, 2.807354922, 2.584962501,
3.584962501, 3.906890596, 3, 3.459431619, 2, 4.321928095,
3.459431619, 3.807354922, 3, 2.807354922, 0, 0, 0, 0, 0,
0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-24L), .Names = c("sample", "treatmt", "y"))
Current Code :
options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly"))
df$sample=as.factor(df$sample)
lmer(y~ 1+treatmt+(1|treatmt:sample),REML=FALSE, data = df)
Current Output :
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: y ~ 1 + treatmt + (1 | treatmt:sample)
Data: df
AIC BIC logLik deviance df.resid
80.3564 85.0686 -36.1782 72.3564 20
Random effects:
Groups Name Std.Dev.
treatmt:sample (Intercept) 2.344
Residual 0.564
Number of obs: 24, groups: treatmt:sample, 8
Fixed Effects:
(Intercept) treatmttrt1
3.3391 -0.1072
Upvotes: 2