Reputation: 67
Been using SAS for 6 years and migrating across to R. I used to use proc contents to get a healthy description of a table, a characteristic and a data type.
Using str(tableName)
I can see the type but not the vector position in a data frame.
Using name(tableName)
I can see the names and positions of the vectors but not the type.
Using summary(tableName)
I can see the quantiles/category but not the type easily or vector position.
Is there a way I can just get a list of Name vectorPosition type min max avg med [..]
Upvotes: 3
Views: 3879
Reputation: 13080
This is really "quick 'n' dirty", but if I understood you correctly, this is kind of what you're after.
As an example, I took the info returned by summary()
and simply added information about class
and mode
for each data frame column. I'm not really familiar with the table
class in R, hence formatting is really off.
df <- data.frame(
a=1:5,
b=rep(TRUE, 5),
c=letters[1:5]
)
mySummary <- function(x, ...) {
out <- NULL
for (ii in 1:ncol(x)) {
temp <- list(
c(paste("Class:", class(x[,ii])), paste("Mode:", mode(x[,ii])),
c(a[,ii]))
)
names(temp) <- names(x)[ii]
out <- c(out, temp)
}
out
}
> mySummary(df)
$a
"Class: integer" "Mode: numeric" "Min. :1 " "1st Qu.:2 "
"Median :3 " "Mean :3 " "3rd Qu.:4 " "Max. :5 "
$b
"Class: logical" "Mode: logical" "Mode:logical " "TRUE:5 "
"NA's:0 " NA NA NA
$c
"Class: factor" "Mode: numeric" "a:1 " "b:1 " "c:1 "
"d:1 " "e:1 " NA
You might want to check out how the summary()
method for class data.frame
is defined and then go ahead and tweak it to fit your needs.
Find out which methods are defined for summary()
methods("summary")
> methods("summary")
[1] summary.aov summary.aovlist summary.aspell*
[4] summary.connection summary.data.frame summary.Date
[7] summary.default summary.ecdf* summary.factor
[10] summary.glm summary.infl summary.lm
[13] summary.loess* summary.manova summary.matrix
[16] summary.mlm summary.nls* summary.packageStatus*
[19] summary.PDF_Dictionary* summary.PDF_Stream* summary.POSIXct
[22] summary.POSIXlt summary.ppr* summary.prcomp*
[25] summary.princomp* summary.srcfile summary.srcref
[28] summary.stepfun summary.stl* summary.table
[31] summary.tukeysmooth*
Non-visible functions are asterisked
Here's a way to get to the code
summary.data.frame
Upvotes: 2
Reputation: 32351
You can use lapply
to call a function
on each column of the data.frame,
and compute all the quantities you want in that function.
summary_text <- function(d) {
do.call(rbind, lapply( d, function(u)
data.frame(
Type = class(u)[1],
Min = if(is.numeric(u)) min( u, na.rm=TRUE) else NA,
Mean = if(is.numeric(u)) mean( u, na.rm=TRUE) else NA,
Median = if(is.numeric(u)) median(u, na.rm=TRUE) else NA,
Max = if(is.numeric(u)) max( u, na.rm=TRUE) else NA,
Missing = sum(is.na(u))
)
) )
}
summary_text(iris)
But I personnally prefer to look at the data graphically: the following function will draw a histogram and a quantile-quantile plot for each numeric variable, and a barplot for each factor, on a single page. If you have 20 to 30 variables, it should remain usable.
summary_plot <- function(d, aspect=1) {
# Split the screen: find the optimal number of columns
# and rows to be as close as possible from the desired aspect ratio.
n <- ncol(d)
dx <- par()$din[1]
dy <- par()$din[2]
f <- function(u,v) {
if( u*v >= n && (u-1)*v < n && u*(v-1) < n ) {
abs(log((dx/u)/(dy/v)) - log(aspect))
} else {
NA
}
}
f <- Vectorize(f)
r <- outer( 1:n, 1:n, f )
r <- which( r == min(r,na.rm=TRUE), arr.ind=TRUE )
r <- r[1,2:1]
op <- par(mfrow=c(1,1),mar=c(2,2,2,2))
plot.new()
if( is.null( names(d) ) ) { names(d) <- 1:ncol(d) }
ij <- matrix(seq_len(prod(r)), nr=r[1], nc=r[2], byrow=TRUE)
for(k in seq_len(ncol(d))) {
i <- which(ij==k, arr.ind=TRUE)[1]
j <- which(ij==k, arr.ind=TRUE)[2]
i <- r[1] - i + 1
f <- c(j-1,j,i-1,i) / c(r[2], r[2], r[1], r[1] )
par(fig=f, new=TRUE)
if(is.numeric(d[,k])) {
hist(d[,k], las=1, col="grey", main=names(d)[k], xlab="", ylab="")
o <- par(fig=c(
f[1]*.4 + f[2]*.6,
f[1]*.15 + f[2]*.85,
f[3]*.4 + f[4]*.6,
f[3]*.15 + f[4]*.85
),
new=TRUE,
mar=c(0,0,0,0)
)
qqnorm(d[,k],axes=FALSE,xlab="",ylab="",main="")
qqline(d[,k])
box()
par(o)
} else {
o <- par(mar=c(2,5,2,2))
barplot(table(d[,k]), horiz=TRUE, las=1, main=names(d)[k])
par(o)
}
}
par(op)
}
summary_plot(iris)
Upvotes: 7
Reputation: 263332
I suspect you just want:
lapply(tableName, class)
It's possible that you might think you want:
lapply(tableName, typeof)
... but typeof
returns only the storage mode which is less informative, becauseunctions in R are dispatched on the 'class' of variables.
Upvotes: 0
Reputation: 162321
Sounds like you might be looking for something like describe()
, from the Hmisc
package. My recollection is that Frank Harrel (the package's author) was a long-time SAS programmer who came over to the R world fairly early on. The style of the summaries that describe()
provides certainly seem to reflect that computing genealogy:
library(Hmisc)
describe(cars) # for example
cars
2 Variables 50 Observations
---------------------------------------------------------------------------------
speed
n missing unique Mean .05 .10 .25 .50 .75 .90
50 0 19 15.4 7.0 8.9 12.0 15.0 19.0 23.1
.95
24.0
4 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25
Frequency 2 2 1 1 3 2 4 4 4 3 2 3 4 3 5 1 1 4 1
% 4 4 2 2 6 4 8 8 8 6 4 6 8 6 10 2 2 8 2
---------------------------------------------------------------------------------
dist
n missing unique Mean .05 .10 .25 .50 .75 .90
50 0 35 42.98 10.00 15.80 26.00 36.00 56.00 80.40
.95
88.85
lowest : 2 4 10 14 16, highest: 84 85 92 93 120
---------------------------------------------------------------------------------
Upvotes: 8