Reputation: 3647
I am estimating some spatial econometric models that contain both a spatial autoregressive term rho and a spatial error term lambda. In attempting to communicate my results I was using the texreg
package, which accepts the sacsarlm
models I am working with. I noticed, however, that texreg
is printing p-values for the rho and lambda parameters that are identical. Texreg
appears to be returning the p-value found in the model@LR1$p.value
slot of the model object.
The parameters rho and lambda are different in magnitude and have different standard errors so they should not have equivalent p-values. If I call summary on the model object I get unique p-values, but cannot figure out where those values are stored within the model object despite going through each element in a str(model)
call.
My question is twofold:
Below is a minimal example that shows the problem:
library(spdep)
library(texreg)
set.seed(42)
W.ran <- matrix(rbinom(100*100, 1, .3),nrow=100)
X <- rnorm(100)
Y <- .2 * X + rnorm(100) + .9*(W.ran %*% X)
W.test <- mat2listw(W.ran)
model <- sacsarlm(Y~X, type = "sacmixed",
listw=W.test, zero.policy=TRUE)
summary(model)
Call:sacsarlm(formula = Y ~ X, listw = W.test, type = "sacmixed",
zero.policy = TRUE)
Residuals:
Min 1Q Median 3Q Max
-2.379283 -0.750922 0.036044 0.675951 2.577148
Type: sacmixed
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.91037455 0.65700059 1.3857 0.1659
X -0.00076362 0.10330510 -0.0074 0.9941
lag.(Intercept) -0.03193863 0.02310075 -1.3826 0.1668
lag.X 0.89764491 0.02231353 40.2287 <2e-16
Rho: -0.0028541
Asymptotic standard error: 0.0059647
z-value: -0.47849, p-value: 0.6323
Lambda: -0.020578
Asymptotic standard error: 0.020057
z-value: -1.026, p-value: 0.3049
LR test value: 288.74, p-value: < 2.22e-16
Log likelihood: -145.4423 for sacmixed model
ML residual variance (sigma squared): 1.0851, (sigma: 1.0417)
Number of observations: 100
Number of parameters estimated: 7
AIC: 304.88, (AIC for lm: 585.63)
screenreg(model)
=================================
Model 1
---------------------------------
(Intercept) 0.91
(0.66)
X -0.00
(0.10)
lag.(Intercept) -0.03
(0.02)
lag.X 0.90 ***
(0.02)
---------------------------------
Num. obs. 100
Parameters 7
AIC (Linear model) 585.63
AIC (Spatial model) 304.88
Log Likelihood -145.44
Wald test: statistic 1.05
Wald test: p-value 0.90
Lambda: statistic -0.02
Lambda: p-value 0.00
Rho: statistic -0.00
Rho: p-value 0.00
=================================
*** p < 0.001, ** p < 0.01, * p < 0.05
Obviously, in the example, Rho and Lambda have different p-values neither of which are zero and thus there is a problem with the texreg output. Any help with why this is occurring or where to obtain the correct p-values much appreciated!
Upvotes: 4
Views: 749
Reputation: 2323
texreg
author here. Thanks for catching this. As described in my reply here, texreg
uses extract
methods to retrieve the relevant information from any of the (currently more than 70 supported) model object types. It looks like there is a glitch in the GOF part of the method for sarlm
objects.
Here is what the method currently looks like (as of texreg
1.36.13):
# extension for sarlm objects (spdep package)
extract.sarlm <- function(model, include.nobs = TRUE, include.aic = TRUE,
include.loglik = TRUE, include.wald = TRUE, include.lambda = TRUE,
include.rho = TRUE, ...) {
s <- summary(model, ...)
names <- rownames(s$Coef)
cf <- s$Coef[, 1]
se <- s$Coef[, 2]
p <- s$Coef[, ncol(s$Coef)]
gof <- numeric()
gof.names <- character()
gof.decimal <- logical()
if (include.nobs == TRUE) {
n <- length(s$fitted.values)
param <- s$parameters
gof <- c(gof, n, param)
gof.names <- c(gof.names, "Num.\ obs.", "Parameters")
gof.decimal <- c(gof.decimal, FALSE, FALSE)
}
if (include.aic == TRUE) {
aic <- AIC(model)
aiclm <- s$AIC_lm.model
gof <- c(gof, aiclm, aic)
gof.names <- c(gof.names, "AIC (Linear model)", "AIC (Spatial model)")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
if (include.loglik == TRUE) {
ll <- s$LL
gof <- c(gof, ll)
gof.names <- c(gof.names, "Log Likelihood")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.wald == TRUE) {
waldstat <- s$Wald1$statistic
waldp <- s$Wald1$p.value
gof <- c(gof, waldstat, waldp)
gof.names <- c(gof.names, "Wald test: statistic", "Wald test: p-value")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
if (include.lambda == TRUE && !is.null(s$lambda)) {
lambda <- s$lambda
LRpval <- s$LR1$p.value[1]
gof <- c(gof, lambda, LRpval)
gof.names <- c(gof.names, "Lambda: statistic", "Lambda: p-value")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
if (include.rho == TRUE && !is.null(s$rho)) {
rho <- s$rho
LRpval <- s$LR1$p.value[1]
gof <- c(gof, rho, LRpval)
gof.names <- c(gof.names, "Rho: statistic", "Rho: p-value")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
tr <- createTexreg(
coef.names = names,
coef = cf,
se = se,
pvalues = p,
gof.names = gof.names,
gof = gof,
gof.decimal = gof.decimal
)
return(tr)
}
setMethod("extract", signature = className("sarlm", "spdep"),
definition = extract.sarlm)
I think you are right that the lambda and rho parts need some updating. The sacsarlm
function does not store the results printed by its summary
method in any object, so you are rightly pointing out that any attempts using str
do not appear to show the true p-values etc.
Therefore it makes sense to take a look at what the print.summary.sarlm
function in the spdep
package actually does when it prints the summary. I found the code for this function in the file R/summary.spsarlm.R
in the source code of the package on CRAN. It looks like this:
print.summary.sarlm <- function(x, digits = max(5, .Options$digits - 3),
signif.stars = FALSE, ...)
{
cat("\nCall:", deparse(x$call), sep = "", fill=TRUE)
if (x$type == "error") if (isTRUE(all.equal(x$lambda, x$interval[1])) ||
isTRUE(all.equal(x$lambda, x$interval[2])))
warning("lambda on interval bound - results should not be used")
if (x$type == "lag" || x$type == "mixed")
if (isTRUE(all.equal(x$rho, x$interval[1])) ||
isTRUE(all.equal(x$rho, x$interval[2])))
warning("rho on interval bound - results should not be used")
cat("\nResiduals:\n")
resid <- residuals(x)
nam <- c("Min", "1Q", "Median", "3Q", "Max")
rq <- if (length(dim(resid)) == 2L)
structure(apply(t(resid), 1, quantile), dimnames = list(nam,
dimnames(resid)[[2]]))
else structure(quantile(resid), names = nam)
print(rq, digits = digits, ...)
cat("\nType:", x$type, "\n")
if (x$zero.policy) {
zero.regs <- attr(x, "zero.regs")
if (!is.null(zero.regs))
cat("Regions with no neighbours included:\n",
zero.regs, "\n")
}
if (!is.null(x$coeftitle)) {
cat("Coefficients:", x$coeftitle, "\n")
coefs <- x$Coef
if (!is.null(aliased <- x$aliased) && any(x$aliased)){
cat(" (", table(aliased)["TRUE"],
" not defined because of singularities)\n", sep = "")
cn <- names(aliased)
coefs <- matrix(NA, length(aliased), 4, dimnames = list(cn,
colnames(x$Coef)))
coefs[!aliased, ] <- x$Coef
}
printCoefmat(coefs, signif.stars=signif.stars, digits=digits,
na.print="NA")
}
# res <- LR.sarlm(x, x$lm.model)
res <- x$LR1
pref <- ifelse(x$ase, "Asymptotic", "Approximate (numerical Hessian)")
if (x$type == "error") {
cat("\nLambda: ", format(signif(x$lambda, digits)),
", LR test value: ", format(signif(res$statistic,
digits)), ", p-value: ", format.pval(res$p.value,
digits), "\n", sep="")
if (!is.null(x$lambda.se)) {
if (!is.null(x$adj.se)) {
x$lambda.se <- sqrt((x$lambda.se^2)*x$adj.se)
}
cat(pref, " standard error: ",
format(signif(x$lambda.se, digits)),
ifelse(is.null(x$adj.se), "\n z-value: ",
"\n t-value: "), format(signif((x$lambda/
x$lambda.se), digits)),
", p-value: ", format.pval(2*(1-pnorm(abs(x$lambda/
x$lambda.se))), digits), "\n", sep="")
cat("Wald statistic: ", format(signif(x$Wald1$statistic,
digits)), ", p-value: ", format.pval(x$Wald1$p.value,
digits), "\n", sep="")
}
} else if (x$type == "sac" || x$type == "sacmixed") {
cat("\nRho: ", format(signif(x$rho, digits)), "\n",
sep="")
if (!is.null(x$rho.se)) {
if (!is.null(x$adj.se)) {
x$rho.se <- sqrt((x$rho.se^2)*x$adj.se)
}
cat(pref, " standard error: ",
format(signif(x$rho.se, digits)),
ifelse(is.null(x$adj.se), "\n z-value: ",
"\n t-value: "),
format(signif((x$rho/x$rho.se), digits)),
", p-value: ", format.pval(2 * (1 - pnorm(abs(x$rho/
x$rho.se))), digits), "\n", sep="")
}
cat("Lambda: ", format(signif(x$lambda, digits)), "\n", sep="")
if (!is.null(x$lambda.se)) {
pref <- ifelse(x$ase, "Asymptotic",
"Approximate (numerical Hessian)")
if (!is.null(x$adj.se)) {
x$lambda.se <- sqrt((x$lambda.se^2)*x$adj.se)
}
cat(pref, " standard error: ",
format(signif(x$lambda.se, digits)),
ifelse(is.null(x$adj.se), "\n z-value: ",
"\n t-value: "), format(signif((x$lambda/
x$lambda.se), digits)),
", p-value: ", format.pval(2*(1-pnorm(abs(x$lambda/
x$lambda.se))), digits), "\n", sep="")
}
cat("\nLR test value: ", format(signif(res$statistic, digits)),
", p-value: ", format.pval(res$p.value, digits), "\n",
sep="")
} else {
cat("\nRho: ", format(signif(x$rho, digits)),
", LR test value: ", format(signif(res$statistic, digits)),
", p-value: ", format.pval(res$p.value, digits), "\n",
sep="")
if (!is.null(x$rho.se)) {
if (!is.null(x$adj.se)) {
x$rho.se <- sqrt((x$rho.se^2)*x$adj.se)
}
cat(pref, " standard error: ",
format(signif(x$rho.se, digits)),
ifelse(is.null(x$adj.se), "\n z-value: ",
"\n t-value: "),
format(signif((x$rho/x$rho.se), digits)),
", p-value: ", format.pval(2 * (1 - pnorm(abs(x$rho/
x$rho.se))), digits), "\n", sep="")
}
if (!is.null(x$Wald1)) {
cat("Wald statistic: ", format(signif(x$Wald1$statistic,
digits)), ", p-value: ", format.pval(x$Wald1$p.value,
digits), "\n", sep="")
}
}
cat("\nLog likelihood:", logLik(x), "for", x$type, "model\n")
cat("ML residual variance (sigma squared): ",
format(signif(x$s2, digits)), ", (sigma: ",
format(signif(sqrt(x$s2), digits)), ")\n", sep="")
if (!is.null(x$NK)) cat("Nagelkerke pseudo-R-squared:",
format(signif(x$NK, digits)), "\n")
cat("Number of observations:", length(x$residuals), "\n")
cat("Number of parameters estimated:", x$parameters, "\n")
cat("AIC: ", format(signif(AIC(x), digits)), ", (AIC for lm: ",
format(signif(x$AIC_lm.model, digits)), ")\n", sep="")
if (x$type == "error") {
if (!is.null(x$Haus)) {
cat("Hausman test: ", format(signif(x$Haus$statistic,
digits)), ", df: ", format(x$Haus$parameter),
", p-value: ", format.pval(x$Haus$p.value, digits),
"\n", sep="")
}
}
if ((x$type == "lag" || x$type == "mixed") && x$ase) {
cat("LM test for residual autocorrelation\n")
cat("test value: ", format(signif(x$LMtest, digits)),
", p-value: ", format.pval((1 - pchisq(x$LMtest, 1)),
digits), "\n", sep="")
}
if (x$type != "error" && !is.null(x$LLCoef)) {
cat("\nCoefficients: (log likelihood/likelihood ratio)\n")
printCoefmat(x$LLCoef, signif.stars=signif.stars,
digits=digits, na.print="NA")
}
correl <- x$correlation
if (!is.null(correl)) {
p <- NCOL(correl)
if (p > 1) {
cat("\n", x$correltext, "\n")
correl <- format(round(correl, 2), nsmall = 2,
digits = digits)
correl[!lower.tri(correl)] <- ""
print(correl[-1, -p, drop = FALSE], quote = FALSE)
}
}
cat("\n")
invisible(x)
}
You can see there that the function first distinguishes between different sub-models (error
vs. sac
/sacmixed
vs. else), then decides which standard errors to use, and then computes the p-values on the fly without saving them anywhere.
So this is what we need to do as well in our extract
method in order to get the same result as the summary
method in the spdep
package. We also need to move this from the GOF block up to the coefficient block of the table (see comments section below for a discussion). Here is my attempt at adopting their approach in the extract
method:
# extension for sarlm objects (spdep package)
extract.sarlm <- function(model, include.nobs = TRUE, include.loglik = TRUE,
include.aic = TRUE, include.lr = TRUE, include.wald = TRUE, ...) {
s <- summary(model, ...)
names <- rownames(s$Coef)
cf <- s$Coef[, 1]
se <- s$Coef[, 2]
p <- s$Coef[, ncol(s$Coef)]
if (model$type != "error") { # include coefficient for autocorrelation term
rho <- model$rho
cf <- c(cf, rho)
names <- c(names, "$\\rho$")
if (!is.null(model$rho.se)) {
if (!is.null(model$adj.se)) {
rho.se <- sqrt((model$rho.se^2) * model$adj.se)
} else {
rho.se <- model$rho.se
}
rho.pval <- 2 * (1 - pnorm(abs(rho / rho.se)))
se <- c(se, rho.se)
p <- c(p, rho.pval)
} else {
se <- c(se, NA)
p <- c(p, NA)
}
}
if (!is.null(model$lambda)) {
cf <-c(cf, model$lambda)
names <- c(names, "$\\lambda$")
if (!is.null(model$lambda.se)) {
if (!is.null(model$adj.se)) {
lambda.se <- sqrt((model$lambda.se^2) * model$adj.se)
} else {
lambda.se <- model$lambda.se
}
lambda.pval <- 2 * (1 - pnorm(abs(model$lambda / lambda.se)))
se <- c(se, lambda.se)
p <- c(p, lambda.pval)
} else {
se <- c(se, NA)
p <- c(p, NA)
}
}
gof <- numeric()
gof.names <- character()
gof.decimal <- logical()
if (include.nobs == TRUE) {
n <- length(s$fitted.values)
param <- s$parameters
gof <- c(gof, n, param)
gof.names <- c(gof.names, "Num.\ obs.", "Parameters")
gof.decimal <- c(gof.decimal, FALSE, FALSE)
}
if (include.loglik == TRUE) {
ll <- s$LL
gof <- c(gof, ll)
gof.names <- c(gof.names, "Log Likelihood")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.aic == TRUE) {
aic <- AIC(model)
aiclm <- s$AIC_lm.model
gof <- c(gof, aiclm, aic)
gof.names <- c(gof.names, "AIC (Linear model)", "AIC (Spatial model)")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
if (include.lr == TRUE && !is.null(s$LR1)) {
gof <- c(gof, s$LR1$statistic[[1]], s$LR1$p.value[[1]])
gof.names <- c(gof.names, "LR test: statistic", "LR test: p-value")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
if (include.wald == TRUE && !is.null(model$Wald1)) {
waldstat <- model$Wald1$statistic
waldp <- model$Wald1$p.value
gof <- c(gof, waldstat, waldp)
gof.names <- c(gof.names, "Wald test: statistic", "Wald test: p-value")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
tr <- createTexreg(
coef.names = names,
coef = cf,
se = se,
pvalues = p,
gof.names = gof.names,
gof = gof,
gof.decimal = gof.decimal
)
return(tr)
}
setMethod("extract", signature = className("sarlm", "spdep"),
definition = extract.sarlm)
You can just execute this code at run-time to update the way texreg
handles these objects. Please do let me know if you still think there are any glitches I haven't spotted. If none are reported in the comments, I will include this updated extract
method in the next texreg
release.
With these changes, calling screenreg(model, single.row = TRUE)
yields the following output:
=======================================
Model 1
---------------------------------------
(Intercept) 0.91 (0.66)
X -0.00 (0.10)
lag.(Intercept) -0.03 (0.02)
lag.X 0.90 (0.02) ***
rho -0.00 (0.01)
lambda -0.02 (0.02)
---------------------------------------
Num. obs. 100
Parameters 7
Log Likelihood -145.44
AIC (Linear model) 585.63
AIC (Spatial model) 304.88
LR test: statistic 288.74
LR test: p-value 0.00
=======================================
*** p < 0.001, ** p < 0.01, * p < 0.05
Upvotes: 6