user1253493
user1253493

Reputation: 67

How can I get a detailed table list in R?

Been using SAS for 6 years and migrating across to R. I used to use proc contents to get a healthy description of a table, a characteristic and a data type.

Using str(tableName) I can see the type but not the vector position in a data frame.

Using name(tableName) I can see the names and positions of the vectors but not the type.

Using summary(tableName) I can see the quantiles/category but not the type easily or vector position.

Is there a way I can just get a list of Name vectorPosition type min max avg med [..]

Upvotes: 3

Views: 3879

Answers (4)

Rappster
Rappster

Reputation: 13080

This is really "quick 'n' dirty", but if I understood you correctly, this is kind of what you're after.

As an example, I took the info returned by summary() and simply added information about class and mode for each data frame column. I'm not really familiar with the table class in R, hence formatting is really off.

df <- data.frame(
    a=1:5,
    b=rep(TRUE, 5),
    c=letters[1:5]
)

mySummary <- function(x, ...) {
    out <- NULL
    for (ii in 1:ncol(x)) {
        temp <- list(
            c(paste("Class:", class(x[,ii])), paste("Mode:", mode(x[,ii])),
            c(a[,ii]))
        )
        names(temp) <- names(x)[ii]
        out <- c(out, temp)
    }   
    out 
}

> mySummary(df)
$a

"Class: integer"  "Mode: numeric"    "Min.   :1  "    "1st Qu.:2  " 

   "Median :3  "    "Mean   :3  "    "3rd Qu.:4  "    "Max.   :5  " 

$b

"Class: logical"  "Mode: logical" "Mode:logical  " "TRUE:5        " 

"NA's:0        "               NA               NA               NA 

$c

"Class: factor" "Mode: numeric"         "a:1  "         "b:1  "         "c:1  " 

        "d:1  "         "e:1  "              NA 

You might want to check out how the summary() method for class data.frame is defined and then go ahead and tweak it to fit your needs.

Find out which methods are defined for summary()

methods("summary")

> methods("summary")
 [1] summary.aov             summary.aovlist         summary.aspell*        
 [4] summary.connection      summary.data.frame      summary.Date           
 [7] summary.default         summary.ecdf*           summary.factor         
[10] summary.glm             summary.infl            summary.lm             
[13] summary.loess*          summary.manova          summary.matrix         
[16] summary.mlm             summary.nls*            summary.packageStatus* 
[19] summary.PDF_Dictionary* summary.PDF_Stream*     summary.POSIXct        
[22] summary.POSIXlt         summary.ppr*            summary.prcomp*        
[25] summary.princomp*       summary.srcfile         summary.srcref         
[28] summary.stepfun         summary.stl*            summary.table          
[31] summary.tukeysmooth*   

   Non-visible functions are asterisked

Here's a way to get to the code

summary.data.frame

Upvotes: 2

Vincent Zoonekynd
Vincent Zoonekynd

Reputation: 32351

You can use lapply to call a function on each column of the data.frame, and compute all the quantities you want in that function.

summary_text <- function(d) {
  do.call(rbind, lapply( d, function(u)
    data.frame(
      Type    = class(u)[1],
      Min     = if(is.numeric(u)) min(   u, na.rm=TRUE) else NA,
      Mean    = if(is.numeric(u)) mean(  u, na.rm=TRUE) else NA,
      Median  = if(is.numeric(u)) median(u, na.rm=TRUE) else NA,
      Max     = if(is.numeric(u)) max(   u, na.rm=TRUE) else NA,
      Missing = sum(is.na(u))
    )    
  ) )
}
summary_text(iris)

But I personnally prefer to look at the data graphically: the following function will draw a histogram and a quantile-quantile plot for each numeric variable, and a barplot for each factor, on a single page. If you have 20 to 30 variables, it should remain usable.

summary_plot <- function(d, aspect=1) {
  # Split the screen: find the optimal number of columns 
  # and rows to be as close as possible from the desired aspect ratio.
  n <- ncol(d)
  dx <- par()$din[1]
  dy <- par()$din[2]
  f <- function(u,v) {
    if( u*v >= n && (u-1)*v < n && u*(v-1) < n ) {
      abs(log((dx/u)/(dy/v)) - log(aspect))
    } else { 
      NA 
    }
  }
  f <- Vectorize(f)
  r <- outer( 1:n, 1:n, f )
  r <- which( r == min(r,na.rm=TRUE), arr.ind=TRUE )
  r <- r[1,2:1]

  op <- par(mfrow=c(1,1),mar=c(2,2,2,2))
  plot.new()
  if( is.null( names(d) ) ) { names(d) <- 1:ncol(d) }
  ij <- matrix(seq_len(prod(r)), nr=r[1], nc=r[2], byrow=TRUE)
  for(k in seq_len(ncol(d))) {
    i <- which(ij==k, arr.ind=TRUE)[1]
    j <- which(ij==k, arr.ind=TRUE)[2]
    i <- r[1] - i + 1
    f <- c(j-1,j,i-1,i) / c(r[2], r[2], r[1], r[1] )
    par(fig=f, new=TRUE)
    if(is.numeric(d[,k])) { 
      hist(d[,k], las=1, col="grey", main=names(d)[k], xlab="", ylab="")
      o <- par(fig=c(
          f[1]*.4  + f[2]*.6,
          f[1]*.15 + f[2]*.85,
          f[3]*.4  + f[4]*.6,
          f[3]*.15 + f[4]*.85
        ), 
        new=TRUE,
        mar=c(0,0,0,0)
      )
      qqnorm(d[,k],axes=FALSE,xlab="",ylab="",main="")
      qqline(d[,k])
      box()
      par(o)
    } else {
      o <- par(mar=c(2,5,2,2))
      barplot(table(d[,k]), horiz=TRUE, las=1, main=names(d)[k])
      par(o)
    }
  }
  par(op)
}
summary_plot(iris)

Upvotes: 7

IRTFM
IRTFM

Reputation: 263332

I suspect you just want:

lapply(tableName, class)

It's possible that you might think you want:

lapply(tableName, typeof)

... but typeof returns only the storage mode which is less informative, becauseunctions in R are dispatched on the 'class' of variables.

Upvotes: 0

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162321

Sounds like you might be looking for something like describe(), from the Hmisc package. My recollection is that Frank Harrel (the package's author) was a long-time SAS programmer who came over to the R world fairly early on. The style of the summaries that describe() provides certainly seem to reflect that computing genealogy:

library(Hmisc)
describe(cars) # for example
cars 

 2  Variables      50  Observations
---------------------------------------------------------------------------------
speed 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     50       0      19    15.4     7.0     8.9    12.0    15.0    19.0    23.1 
    .95 
   24.0 

          4 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25
Frequency 2 2 1 1  3  2  4  4  4  3  2  3  4  3  5  1  1  4  1
%         4 4 2 2  6  4  8  8  8  6  4  6  8  6 10  2  2  8  2
---------------------------------------------------------------------------------
dist 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     50       0      35   42.98   10.00   15.80   26.00   36.00   56.00   80.40 
    .95 
  88.85 

lowest :   2   4  10  14  16, highest:  84  85  92  93 120 
---------------------------------------------------------------------------------

Upvotes: 8

Related Questions