Arthur Carvalho Brito
Arthur Carvalho Brito

Reputation: 560

How to approach the generation of tables based on conditionals using xtable

I am only in the beginning of the use of packages like xtable or `stargazer. Bellow is a sample dataset, the original one is much larger.

set.seed(1)
df <- data.frame(rep(
    sample(c(2012,2016),10, replace = T)),
    sample(c('Treat','Control'),10,replace = T),
    runif(10,0,1),
    runif(10,0,1),
    runif(10,0,1))

colnames(df) <- c('Year','Group','V1','V2','V3')

I would like to generate well-formatted tables displaying descriptive statistics for the data set above.

However, is there a way xtable could directly generate a table in which I can see statistics for the whole dataset, separating by group (Treat and Control) and by year (2012, 2016)? Or maybe even by combining groups and years?

Or should I filter the original df according to these settings and run xtable on each one?

Another thing I would want is to, instead of displaying the mean, display the median of the variables, among other statistics. Is it possible, or do I have to manually calculate that using R?

Any solutions considering stargazer are valid!

Thank you!

Upvotes: 1

Views: 881

Answers (1)

eipi10
eipi10

Reputation: 93811

Below are some examples of what you can do in an rmarkdown document, although you can go a lot further in formatting the table than I've done here. I've included some examples using xtable and then a couple of examples with the new kableExtra package, which, unless you're already skilled with latex, makes it easier to format complex table layouts when compared with xtable. See the vignettes for xtable and kableExtra for more information.

For summarizing and reshaping the data, I've used functions from the dplyr and tidyr packages (which are part of the tidyverse suite of packages). You can also summarize and reshape with base R functions, such as aggregate and reshape, and with the data.table package.

rmarkdown document

---
title: "Tables"
author: "eipi10"
date: "7/7/2017"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=FALSE, warning=FALSE)
library(xtable)
options(xtable.comment=FALSE, xtable.include.rownames=FALSE)
library(tidyverse)
```

```{r}
set.seed(1)
df <- data.frame(Year=rep(
    sample(c(2012,2016), 10, replace = T)),
    Group=sample(c('Treat','Control'),10,replace = T),
    V1=runif(10,0,1),
    V2=runif(10,0,1),
    V3=runif(10,0,1))
```

```{r, results="asis"}
# Mean by Year and Group
summary.table = df %>% 
               group_by(Year, Group) %>%
               summarise_all(funs(mean))

print(xtable(summary.table, 
             caption="Mean by Year and Group",
             digits=c(1,0,0,2,2,2)))
```

```{r, results="asis"}
# Median by Year and Group
summary.table = df %>% 
               group_by(Year, Group) %>%
               summarise_all(funs(median))

print(xtable(summary.table, 
             caption="Median by Year and Group",
             digits=c(1,0,0,2,2,2)))
```

```{r, results="asis"}
# Mean and Median by Year and Group
summary.table = df %>% 
               group_by(Year, Group) %>%
               summarise_all(funs(mean, median))

print(xtable(summary.table, 
             caption="Mean and Median by Year and Group",
             digits=c(1,0,0,rep(2,6))))
```

```{r}
# Create a function that takes the summary function calls as arguments
xtab_fnc = function(data, caption, ...) {

  # quosure to dispatch functions properly (see Programming with dplyr vignette)
  funcs = quos(...)

  summary.table = data %>% 
    group_by(Year, Group) %>%
    summarise_all(funs(!!!funcs))

  # Get rid of repeated years
  for(i in nrow(summary.table):2) {
    if(identical(summary.table$Year[i-1], summary.table$Year[i])) {
      summary.table$Year[i] = ""
    }
  }

  xtable(summary.table, 
         caption=caption,
         digits=c(1,0,0,rep(2,ncol(summary.table) - 2)))
}
```

```{r, results="asis"}
# Run the function
print(xtab_fnc(df, "Mean, Median, and Sum by Year and Group", mean, median, sum),
      size="scriptsize")
```

```{r, results="asis"}
# Run the function
print(xtab_fnc(df, "Mean by Year and Group", mean),
      size="large")
```

```{r}
# You can do more complex formatting in xtable, but it's probably easier with kableExtra
library(knitr)
library(kableExtra)
```

```{r}
summary.table = df %>% 
  group_by(Year, Group) %>%
  summarise_all(funs(mean,sd,min,median,max))

# Get rid of repeated years
for(i in nrow(summary.table):2) {
  if(identical(summary.table$Year[i-1], summary.table$Year[i])) {
    summary.table$Year[i] = ""
  }
}

# Get rid of "_mean", and "_median" in column names
names(summary.table) = gsub("_.*","",names(summary.table))

# LaTeX Table
kable(summary.table, format = "latex", 
      booktabs = T, caption = "kableExtra to format spanning columns",
      digits=c(0,0,rep(3,15))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "scale_down"),
                full_width=F) %>%
  add_header_above(c("","","Mean"=3,"SD"=3,"Min"=3,"Median[note]"=3,"Max"=3)) %>%
  add_footnote(c("Note, means and medians are often the same with this data."))
```

```{r}
# Reshape table to turn V1-V3 into rows
summary.table = df %>% 
  group_by(Year, Group) %>%
  summarise_all(funs(mean,sd,min,median,max)) %>% 
  gather(key, value, -Year, -Group) %>%
  separate(key, into=c("var", "stat")) %>%
  unite(stat_Group, stat, Group) %>%
  spread(stat_Group, value) 

# Get rid of repeated years
for(i in nrow(summary.table):2) {
  if(identical(summary.table$Year[i-1], summary.table$Year[i])) {
    summary.table$Year[i] = ""
  }
}

names(summary.table) = gsub(".*_", "", names(summary.table))

# LaTeX Table
kable(summary.table, format = "latex", 
      booktabs = T, caption = "kableExtra to format spanning columns",
      digits=c(0,0,rep(3,10))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "scale_down"),
                full_width=F) %>%
  add_header_above(c("","","Max"=2,"Mean"=2,"Median"=2,"Min"=2,"SD"=2)) 
```

```{r}
# Reshape table to turn V1-V3 into rows; also add a column summarizing all subjects
summary.table = df %>% 
  group_by(Year, Group) %>%
  summarise_all(funs(mean,sd,min,median,max)) %>% 
  bind_rows(df %>%                                              # bind_rows block adds the "All" column
              mutate(Group="All") %>%                           # | 
              group_by(Year, Group) %>%                         # |
              summarise_all(funs(mean,sd,min,median,max))) %>%  # |
  gather(key, value, -Year, -Group) %>%
  separate(key, into=c("var", "stat")) %>%
  unite(stat_Group, stat, Group) %>%
  spread(stat_Group, value) 

# Get rid of repeated years
for(i in nrow(summary.table):2) {
  if(identical(summary.table$Year[i-1], summary.table$Year[i])) {
    summary.table$Year[i] = ""
  }
}

names(summary.table) = gsub(".*_", "", names(summary.table))

# LaTeX Table
kable(summary.table, format = "latex", 
      booktabs = T, caption = "kableExtra to format spanning columns; also add an 'All' column",
      digits=c(0,0,rep(3,15))) %>%
  kable_styling(latex_options = c("striped", "hold_position", "scale_down"),
                full_width=F) %>%
  add_header_above(c("","","Max"=3,"Mean"=3,"Median"=3,"Min"=3,"SD"=3)) 
```

PDF Output File

enter image description here enter image description here

Upvotes: 5

Related Questions