Bobe Kryant
Bobe Kryant

Reputation: 2100

Hide some coefficients in regression summary while still returning call, r-squared and other summary output

My question is similar to this one however I am interested in returning all the other outputs, not just coefficients. Here is sample code to make my question clearer.

data=as.data.frame(matrix(rnorm(50*50),50,50))
summary(lm(data[,1]~.-data[,1],data=data))

I want to only output say the first 5 coefficients. I know I can just do that with summary(lm(data[,1]~.-data[,1],data=data))$coeff[1:5,], but that would get rid of all the other outputs I want. I also know I can get each output individually, I just want to know if there is a succicnt way of writing a one liner and cutting out variables that I do not want reported.

Upvotes: 1

Views: 6637

Answers (1)

eipi10
eipi10

Reputation: 93861

You can select the coefficients you want with a slight modification to the print.summary.lm function, which is the internal function that R uses to output the summary results for a summary.lm object.

First, get the code for the function as follows:

getAnywhere(print.summary.lm)

Then, we need to figure out where the coefficient table is extracted and subset it to the rows we want. We'll add a new my.rows argument to the function and then subset to those rows when we extract the coefficient table. The code for the modified function is at the end of this answer.

Now, compare the standard summary with our new summary. First I'll create a model with real data (The model you provided isn't specified properly. Looks like what you intended was lm(V1 ~ ., data=data), but even then there are no residual degrees of freedom, so I thought I'd demonstrate with a real data set.):

m1 = lm(mpg ~ wt + hp + cyl + vs + am, data=mtcars)

Standard summary:

summary(m1)
Call:
  lm(formula = mpg ~ wt + hp + cyl + vs + am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.6729 -1.6583 -0.4297  1.3307  5.4688 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 33.24160    5.48527   6.060 2.11e-06 ***
wt          -2.54332    0.93506  -2.720   0.0115 *  
hp          -0.02589    0.01387  -1.866   0.0733 .  
cyl         -0.40179    0.79364  -0.506   0.6169    
vs           1.17067    1.81283   0.646   0.5241    
am           1.97575    1.64825   1.199   0.2415    
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.537 on 26 degrees of freedom
Multiple R-squared:  0.8514,  Adjusted R-squared:  0.8228 
F-statistic:  29.8 on 5 and 26 DF,  p-value: 5.571e-10

New summary with only the coefficients we choose:

Note that we need to call summary on the model first, since my.summary.lm is expecting the summary object, not the model object itself.

my.summary.lm(summary(m1), my.rows=2:4)

Rather than selecting coefficients by index, you might prefer to select by name:

my.summary.lm(summary(m1), my.rows=grep("wt|hp|cyl", names(coef(m1))))
Call:
  lm(formula = mpg ~ wt + hp + cyl + vs + am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.6729 -1.6583 -0.4297  1.3307  5.4688 

Coefficients:
    Estimate Std. Error t value Pr(>|t|)  
wt  -2.54332    0.93506  -2.720   0.0115 *
hp  -0.02589    0.01387  -1.866   0.0733 .
cyl -0.40179    0.79364  -0.506   0.6169  
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.537 on 26 degrees of freedom
Multiple R-squared:  0.8514,  Adjusted R-squared:  0.8228 
F-statistic:  29.8 on 5 and 26 DF,  p-value: 5.571e-10

And here's the function. I've made only two changes to the original function, both of which are marked with inline comments. The first change is the additional my.rows argument. The second is in the line that begins coefs <-:

my.summary.lm = function (x, digits = max(3L, getOption("digits") - 3L), 
                       symbolic.cor = x$symbolic.cor, 
                       signif.stars = getOption("show.signif.stars"), 
                       my.rows, ...)                     # NOTE NEW my.rows ARGUMENT
{
  cat("\nCall:\n", paste(deparse(x$call), sep = "\n", collapse = "\n"), 
      "\n\n", sep = "")
  resid <- x$residuals
  df <- x$df
  rdf <- df[2L]
  cat(if (!is.null(x$weights) && diff(range(x$weights))) 
    "Weighted ", "Residuals:\n", sep = "")
  if (rdf > 5L) {
    nam <- c("Min", "1Q", "Median", "3Q", "Max")
    rq <- if (length(dim(resid)) == 2L) 
      structure(apply(t(resid), 1L, quantile), dimnames = list(nam, 
                                                               dimnames(resid)[[2L]]))
    else {
      zz <- zapsmall(quantile(resid), digits + 1L)
      structure(zz, names = nam)
    }
    print(rq, digits = digits, ...)
  }
  else if (rdf > 0L) {
    print(resid, digits = digits, ...)
  }
  else {
    cat("ALL", df[1L], "residuals are 0: no residual degrees of freedom!")
    cat("\n")
  }
  if (length(x$aliased) == 0L) {
    cat("\nNo Coefficients\n")
  }
  else {
    if (nsingular <- df[3L] - df[1L]) 
      cat("\nCoefficients: (", nsingular, " not defined because of singularities)\n", 
          sep = "")
    else cat("\nCoefficients:\n")
    coefs <- x$coefficients[my.rows,]                      # SUBSET my.rows
    if (!is.null(aliased <- x$aliased) && any(aliased)) {
      cn <- names(aliased)
      coefs <- matrix(NA, length(aliased), 4, dimnames = list(cn, 
                                                              colnames(coefs)))
      coefs[!aliased, ] <- x$coefficients
    }
    printCoefmat(coefs, digits = digits, signif.stars = signif.stars, 
                 na.print = "NA", ...)
  }
  cat("\nResidual standard error:", format(signif(x$sigma, 
                                                  digits)), "on", rdf, "degrees of freedom")
  cat("\n")
  if (nzchar(mess <- naprint(x$na.action))) 
    cat("  (", mess, ")\n", sep = "")
  if (!is.null(x$fstatistic)) {
    cat("Multiple R-squared: ", formatC(x$r.squared, digits = digits))
    cat(",\tAdjusted R-squared: ", formatC(x$adj.r.squared, 
                                           digits = digits), "\nF-statistic:", formatC(x$fstatistic[1L], 
                                                                                       digits = digits), "on", x$fstatistic[2L], "and", 
        x$fstatistic[3L], "DF,  p-value:", format.pval(pf(x$fstatistic[1L], 
                                                          x$fstatistic[2L], x$fstatistic[3L], lower.tail = FALSE), 
                                                       digits = digits))
    cat("\n")
  }
  correl <- x$correlation
  if (!is.null(correl)) {
    p <- NCOL(correl)
    if (p > 1L) {
      cat("\nCorrelation of Coefficients:\n")
      if (is.logical(symbolic.cor) && symbolic.cor) {
        print(symnum(correl, abbr.colnames = NULL))
      }
      else {
        correl <- format(round(correl, 2), nsmall = 2, 
                         digits = digits)
        correl[!lower.tri(correl)] <- ""
        print(correl[-1, -p, drop = FALSE], quote = FALSE)
      }
    }
  }
  cat("\n")
  invisible(x)
}

UPDATE: As I mentioned in my comment, you can roll your own summary function and set it up to work with whatever type of model summary objects you regularly use. In this case, we'll include summary.lm and summary.plm objects, which are the types of objects created when you run summary on lm and plm model objects, respectively.

First we need lm and plm model objects to work with:

# lm object
m1 = lm(mpg ~ wt + hp + cyl + vs + am, data=mtcars)

# plm object
library(plm)

# Example from plm help
data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
          data = Produc, index = c("state","year"))

Now we need a function to output the summary data we want. To create the code below, I just looked at what was in the list objects created by summary(m1) and summary(zz) (do str(summary(m1)) and str(summary(zz)) to see these) so I'd know where to get various elements of the summary objects (such as the call and the residuals). I also in some cases just directly copied code sections from the print.summary.lm and print.summary.plm functions.

The function below doesn't print everything that's included in the native summary functions, but should be enough to show you how to go about adding whatever elements you'd like in the output.

# Summary function that allows selection of which coefficients to include 
# in the coefficient table
# Works with summary.lm and summary.plm objects
my.summary = function(x, rows, digits=3) {

  # Print a few summary elements that are common to both lm and plm model summary objects
  cat("Call\n")
  print(x$call)
  cat("\nResiduals\n")
  print(summary(x$residuals))
  cat("\n")
  print(coef(x)[rows,])

  # Print elements unique to lm model summary objects
  if("summary.lm" %in% class(x)) {
    cat("\nResidual standard error:", round(x$sigma,3), "on", x$df[2], "degrees of freedom")
    cat(paste(c("\nF-statistic:", " on"," and"), round(x$fstatistic,2), collapse=""),
        "DF, p-value:",
        format.pval(pf(x$fstatistic[1L], x$fstatistic[2L], x$fstatistic[3L], 
                       lower.tail = FALSE), digits=digits))

  # Print elements unique to plm model summary objects  
  } else if ("summary.plm" %in% class(x)) {
    cat(paste("\nResidual Sum of Squares: ", signif(deviance(x), 
                                                  digits), "\n", sep = ""))
    fstat <- x$fstatistic
    if (names(fstat$statistic) == "F") {
      cat(paste("F-statistic: ", signif(fstat$statistic), " on ", 
                fstat$parameter["df1"], " and ", fstat$parameter["df2"], 
                " DF, p-value: ", format.pval(fstat$p.value, digits = digits), 
                "\n", sep = ""))
    }
    else {
      cat(paste("Chisq: ", signif(fstat$statistic), " on ", 
                fstat$parameter, " DF, p-value: ", format.pval(fstat$p.value, 
                                                               digits = digits), "\n", sep = ""))
    }
  }
}

Now run the function on an lm model and a plm model:

my.summary(summary(m1), 2:4)
Call
lm(formula = mpg ~ wt + hp + cyl + vs + am, data = mtcars)

Residuals
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-3.6730 -1.6580 -0.4297  0.0000  1.3310  5.4690 

       Estimate Std. Error    t value   Pr(>|t|)
wt  -2.54331718 0.93506164 -2.7199460 0.01148231
hp  -0.02588661 0.01387176 -1.8661377 0.07334148
cyl -0.40178727 0.79364098 -0.5062582 0.61694148

Residual standard error: 2.537 on 26 degrees of freedom
F-statistic: 29.8 on 5 and 26 DF, p-value: 5.57e-10
my.summary(summary(zz), 2:3)
Call
plm(formula = log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, 
    data = Produc, index = c("state", "year"))

Residuals
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.120500 -0.023740 -0.002041  0.000000  0.018140  0.174700 

          Estimate Std. Error  t-value      Pr(>|t|)
log(pc)  0.2920069 0.02511967 11.62463  7.075069e-29
log(emp) 0.7681595 0.03009174 25.52725 2.021455e-104

Residual Sum of Squares: 1.11
F-statistic: 3064.81 on 4 and 764 DF, p-value: <2e-16

I suppose if you really want to go all the way, you could take advantage of object orientation and write your own generic function with methods for each type of model you want to include.

Upvotes: 6

Related Questions