Reputation: 2100
My question is similar to this one however I am interested in returning all the other outputs, not just coefficients. Here is sample code to make my question clearer.
data=as.data.frame(matrix(rnorm(50*50),50,50))
summary(lm(data[,1]~.-data[,1],data=data))
I want to only output say the first 5 coefficients. I know I can just do that with
summary(lm(data[,1]~.-data[,1],data=data))$coeff[1:5,]
, but that would get rid of all the other outputs I want. I also know I can get each output individually, I just want to know if there is a succicnt way of writing a one liner and cutting out variables that I do not want reported.
Upvotes: 1
Views: 6637
Reputation: 93861
You can select the coefficients you want with a slight modification to the print.summary.lm
function, which is the internal function that R uses to output the summary results for a summary.lm
object.
First, get the code for the function as follows:
getAnywhere(print.summary.lm)
Then, we need to figure out where the coefficient table is extracted and subset it to the rows we want. We'll add a new my.rows
argument to the function and then subset to those rows when we extract the coefficient table. The code for the modified function is at the end of this answer.
Now, compare the standard summary with our new summary. First I'll create a model with real data (The model you provided isn't specified properly. Looks like what you intended was lm(V1 ~ ., data=data)
, but even then there are no residual degrees of freedom, so I thought I'd demonstrate with a real data set.):
m1 = lm(mpg ~ wt + hp + cyl + vs + am, data=mtcars)
Standard summary:
summary(m1)
Call: lm(formula = mpg ~ wt + hp + cyl + vs + am, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.6729 -1.6583 -0.4297 1.3307 5.4688 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 33.24160 5.48527 6.060 2.11e-06 *** wt -2.54332 0.93506 -2.720 0.0115 * hp -0.02589 0.01387 -1.866 0.0733 . cyl -0.40179 0.79364 -0.506 0.6169 vs 1.17067 1.81283 0.646 0.5241 am 1.97575 1.64825 1.199 0.2415 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.537 on 26 degrees of freedom Multiple R-squared: 0.8514, Adjusted R-squared: 0.8228 F-statistic: 29.8 on 5 and 26 DF, p-value: 5.571e-10
New summary with only the coefficients we choose:
Note that we need to call summary
on the model first, since my.summary.lm
is expecting the summary object, not the model object itself.
my.summary.lm(summary(m1), my.rows=2:4)
Rather than selecting coefficients by index, you might prefer to select by name:
my.summary.lm(summary(m1), my.rows=grep("wt|hp|cyl", names(coef(m1))))
Call: lm(formula = mpg ~ wt + hp + cyl + vs + am, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.6729 -1.6583 -0.4297 1.3307 5.4688 Coefficients: Estimate Std. Error t value Pr(>|t|) wt -2.54332 0.93506 -2.720 0.0115 * hp -0.02589 0.01387 -1.866 0.0733 . cyl -0.40179 0.79364 -0.506 0.6169 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.537 on 26 degrees of freedom Multiple R-squared: 0.8514, Adjusted R-squared: 0.8228 F-statistic: 29.8 on 5 and 26 DF, p-value: 5.571e-10
And here's the function. I've made only two changes to the original function, both of which are marked with inline comments. The first change is the additional my.rows
argument. The second is in the line that begins coefs <-
:
my.summary.lm = function (x, digits = max(3L, getOption("digits") - 3L),
symbolic.cor = x$symbolic.cor,
signif.stars = getOption("show.signif.stars"),
my.rows, ...) # NOTE NEW my.rows ARGUMENT
{
cat("\nCall:\n", paste(deparse(x$call), sep = "\n", collapse = "\n"),
"\n\n", sep = "")
resid <- x$residuals
df <- x$df
rdf <- df[2L]
cat(if (!is.null(x$weights) && diff(range(x$weights)))
"Weighted ", "Residuals:\n", sep = "")
if (rdf > 5L) {
nam <- c("Min", "1Q", "Median", "3Q", "Max")
rq <- if (length(dim(resid)) == 2L)
structure(apply(t(resid), 1L, quantile), dimnames = list(nam,
dimnames(resid)[[2L]]))
else {
zz <- zapsmall(quantile(resid), digits + 1L)
structure(zz, names = nam)
}
print(rq, digits = digits, ...)
}
else if (rdf > 0L) {
print(resid, digits = digits, ...)
}
else {
cat("ALL", df[1L], "residuals are 0: no residual degrees of freedom!")
cat("\n")
}
if (length(x$aliased) == 0L) {
cat("\nNo Coefficients\n")
}
else {
if (nsingular <- df[3L] - df[1L])
cat("\nCoefficients: (", nsingular, " not defined because of singularities)\n",
sep = "")
else cat("\nCoefficients:\n")
coefs <- x$coefficients[my.rows,] # SUBSET my.rows
if (!is.null(aliased <- x$aliased) && any(aliased)) {
cn <- names(aliased)
coefs <- matrix(NA, length(aliased), 4, dimnames = list(cn,
colnames(coefs)))
coefs[!aliased, ] <- x$coefficients
}
printCoefmat(coefs, digits = digits, signif.stars = signif.stars,
na.print = "NA", ...)
}
cat("\nResidual standard error:", format(signif(x$sigma,
digits)), "on", rdf, "degrees of freedom")
cat("\n")
if (nzchar(mess <- naprint(x$na.action)))
cat(" (", mess, ")\n", sep = "")
if (!is.null(x$fstatistic)) {
cat("Multiple R-squared: ", formatC(x$r.squared, digits = digits))
cat(",\tAdjusted R-squared: ", formatC(x$adj.r.squared,
digits = digits), "\nF-statistic:", formatC(x$fstatistic[1L],
digits = digits), "on", x$fstatistic[2L], "and",
x$fstatistic[3L], "DF, p-value:", format.pval(pf(x$fstatistic[1L],
x$fstatistic[2L], x$fstatistic[3L], lower.tail = FALSE),
digits = digits))
cat("\n")
}
correl <- x$correlation
if (!is.null(correl)) {
p <- NCOL(correl)
if (p > 1L) {
cat("\nCorrelation of Coefficients:\n")
if (is.logical(symbolic.cor) && symbolic.cor) {
print(symnum(correl, abbr.colnames = NULL))
}
else {
correl <- format(round(correl, 2), nsmall = 2,
digits = digits)
correl[!lower.tri(correl)] <- ""
print(correl[-1, -p, drop = FALSE], quote = FALSE)
}
}
}
cat("\n")
invisible(x)
}
UPDATE: As I mentioned in my comment, you can roll your own summary function and set it up to work with whatever type of model summary objects you regularly use. In this case, we'll include summary.lm
and summary.plm
objects, which are the types of objects created when you run summary
on lm
and plm
model objects, respectively.
First we need lm and plm model objects to work with:
# lm object
m1 = lm(mpg ~ wt + hp + cyl + vs + am, data=mtcars)
# plm object
library(plm)
# Example from plm help
data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc, index = c("state","year"))
Now we need a function to output the summary data we want. To create the code below, I just looked at what was in the list objects created by summary(m1)
and summary(zz)
(do str(summary(m1))
and str(summary(zz))
to see these) so I'd know where to get various elements of the summary objects (such as the call
and the residuals
). I also in some cases just directly copied code sections from the print.summary.lm
and print.summary.plm
functions.
The function below doesn't print everything that's included in the native summary functions, but should be enough to show you how to go about adding whatever elements you'd like in the output.
# Summary function that allows selection of which coefficients to include
# in the coefficient table
# Works with summary.lm and summary.plm objects
my.summary = function(x, rows, digits=3) {
# Print a few summary elements that are common to both lm and plm model summary objects
cat("Call\n")
print(x$call)
cat("\nResiduals\n")
print(summary(x$residuals))
cat("\n")
print(coef(x)[rows,])
# Print elements unique to lm model summary objects
if("summary.lm" %in% class(x)) {
cat("\nResidual standard error:", round(x$sigma,3), "on", x$df[2], "degrees of freedom")
cat(paste(c("\nF-statistic:", " on"," and"), round(x$fstatistic,2), collapse=""),
"DF, p-value:",
format.pval(pf(x$fstatistic[1L], x$fstatistic[2L], x$fstatistic[3L],
lower.tail = FALSE), digits=digits))
# Print elements unique to plm model summary objects
} else if ("summary.plm" %in% class(x)) {
cat(paste("\nResidual Sum of Squares: ", signif(deviance(x),
digits), "\n", sep = ""))
fstat <- x$fstatistic
if (names(fstat$statistic) == "F") {
cat(paste("F-statistic: ", signif(fstat$statistic), " on ",
fstat$parameter["df1"], " and ", fstat$parameter["df2"],
" DF, p-value: ", format.pval(fstat$p.value, digits = digits),
"\n", sep = ""))
}
else {
cat(paste("Chisq: ", signif(fstat$statistic), " on ",
fstat$parameter, " DF, p-value: ", format.pval(fstat$p.value,
digits = digits), "\n", sep = ""))
}
}
}
Now run the function on an lm
model and a plm
model:
my.summary(summary(m1), 2:4)
Call lm(formula = mpg ~ wt + hp + cyl + vs + am, data = mtcars) Residuals Min. 1st Qu. Median Mean 3rd Qu. Max. -3.6730 -1.6580 -0.4297 0.0000 1.3310 5.4690 Estimate Std. Error t value Pr(>|t|) wt -2.54331718 0.93506164 -2.7199460 0.01148231 hp -0.02588661 0.01387176 -1.8661377 0.07334148 cyl -0.40178727 0.79364098 -0.5062582 0.61694148 Residual standard error: 2.537 on 26 degrees of freedom F-statistic: 29.8 on 5 and 26 DF, p-value: 5.57e-10
my.summary(summary(zz), 2:3)
Call plm(formula = log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c("state", "year")) Residuals Min. 1st Qu. Median Mean 3rd Qu. Max. -0.120500 -0.023740 -0.002041 0.000000 0.018140 0.174700 Estimate Std. Error t-value Pr(>|t|) log(pc) 0.2920069 0.02511967 11.62463 7.075069e-29 log(emp) 0.7681595 0.03009174 25.52725 2.021455e-104 Residual Sum of Squares: 1.11 F-statistic: 3064.81 on 4 and 764 DF, p-value: <2e-16
I suppose if you really want to go all the way, you could take advantage of object orientation and write your own generic function with methods for each type of model you want to include.
Upvotes: 6