Max Gordon
Max Gordon

Reputation: 5467

Descriptive tables - how to create a table containing both numeric and categorical variables

I can't find a really intuitive way of doing the most basic thing; creating a summary table with my base variables. The best method I've found is currently using tapply:

seed(200)
my_stats <- function(x){
    if (is.factor(x)){
        a <- table(x, useNA="no")
        b <- round(a*100/sum(a),2)

        # If binary
        if (length(a) == 2){
            ret <- paste(a[1], " (", b[1], " %)", sep="")
        }
        return(ret)
    }else{
        ret <- mean(x, na.rm=T)
        if (ret < 1){
            ret <- round(ret, 2)
        }else{
            ret <- round(ret)
        }
        return(ret)
    }
}

library(rms)
groups <- factor(sample(c("Group A","Group B"), size=51, replace=T))
a <- 3:53 
b <- rnorm(51)
c <- factor(sample(c("male","female"), size=51, replace=T))

res <- rbind(a=tapply(a, groups, my_stats),
      b=tapply(b, groups, my_stats),
      c=tapply(c, groups, my_stats))
latex(latexTranslate(res))

The res contains:

> res
  Group A     Group B       
a "28"        "28"          
b "-0.08"     "-0.21"       
c "14 (56 %)" "14 (53.85 %)"

Now this works but it seems very complex and not the most elegant solution. I've tried to search for how to create descriptive tables but the all focus on the table(), prop.table(), summary() for just single variable or variables of the same kind.

My question: Is there a package/function that allows an easy way of creating a good-looking latex table? If so, please give a hint of how to get the above result.

Thanks!

Upvotes: 2

Views: 2695

Answers (4)

Mike
Mike

Reputation: 4400

If you would like to create a summary table with both catergorical and continuous variables you should look into the package 'tableone'.

Here is an example of what it can do https://rpubs.com/kaz_yos/tableone-vignette. And here is the pdf documentation: https://cran.r-project.org/web/packages/tableone/tableone.pdf

I hope this helps.

  • Mike

Upvotes: 2

Greg Snow
Greg Snow

Reputation: 49660

Look at the tables package for another way that may make this simpler.

Upvotes: 2

Vincent Zoonekynd
Vincent Zoonekynd

Reputation: 32391

If you rewrite your function so that it always returns a string (it sometimes returns a string, sometimes a number, sometimes NULL), you can call ddply on the data.frame, without having to specify all the columns.

f <- function(u) {
  res <- "?" 
  if(is.factor(u) || is.character(u)) {
    u <- table(u, useNA = "no")
    if (length(u) == 0 || sum(u) == 0) { res <- "NA" }
    else { res <- sprintf( "%0.0f%%", 100 * u[1] / sum(u) ) }
  } else {
    u <- mean(u, na.rm=TRUE)
    if(is.na(u)) { res <- "NA" }
    else { res <- sprintf( ifelse( abs(u) < 1, "%0.2f", "%0.0f" ), u ) }
  }
  return( res )
}
# Same function, for data.frames
g <- function(d) do.call( data.frame, lapply(d, f) )

library(plyr)
ddply(data.frame(a,b,c), .(groups), g)

Since you want LaTeX tables, you may also want to try the following, which does not group the data, but adds sparkline histograms for the numeric variables.

library(Hmisc)
latex(describe(d), file="")

Upvotes: 2

joran
joran

Reputation: 173677

What you're asking is a tad open ended, since there's the distinct possibility that you will disagree with me on what constitutes a "good-looking LaTeX table".

For instance, I would probably prefer to organize this by row, rather than by column:

require(plyr)
require(xtable)
dat <- data.frame(a,b,c,groups)
xtable(ddply(dat,.(groups),summarise,a = my_stats(a),
                                     b = my_stats(b),
                                     c = my_stats(c)))


\begin{table}[ht]
\begin{center}
\begin{tabular}{rlrrl}
  \hline
 & groups & a & b & c \\ 
  \hline
1 & Group A & 28.00 & 0.14 & 13 (52 \%) \\ 
  2 & Group B & 28.00 & -0.00 & 13 (50 \%) \\ 
   \hline
\end{tabular}
\end{center}
\end{table}

And of course, much of that is customizable if you look at ?xtable and also ?print.xtable.

Upvotes: 2

Related Questions