Reputation: 560
I am only in the beginning of the use of packages like xtable
or `stargazer. Bellow is a sample dataset, the original one is much larger.
set.seed(1)
df <- data.frame(rep(
sample(c(2012,2016),10, replace = T)),
sample(c('Treat','Control'),10,replace = T),
runif(10,0,1),
runif(10,0,1),
runif(10,0,1))
colnames(df) <- c('Year','Group','V1','V2','V3')
I would like to generate well-formatted tables displaying descriptive statistics for the data set above.
However, is there a way xtable
could directly generate a table in which I can see statistics for the whole dataset, separating by group (Treat and Control) and by year (2012, 2016)? Or maybe even by combining groups and years?
Or should I filter the original df according to these settings and run xtable
on each one?
Another thing I would want is to, instead of displaying the mean, display the median of the variables, among other statistics. Is it possible, or do I have to manually calculate that using R?
Any solutions considering stargazer
are valid!
Thank you!
Upvotes: 1
Views: 881
Reputation: 93811
Below are some examples of what you can do in an rmarkdown
document, although you can go a lot further in formatting the table than I've done here. I've included some examples using xtable
and then a couple of examples with the new kableExtra
package, which, unless you're already skilled with latex
, makes it easier to format complex table layouts when compared with xtable
. See the vignettes for xtable
and kableExtra
for more information.
For summarizing and reshaping the data, I've used functions from the dplyr
and tidyr
packages (which are part of the tidyverse
suite of packages). You can also summarize and reshape with base R functions, such as aggregate
and reshape
, and with the data.table
package.
rmarkdown
document---
title: "Tables"
author: "eipi10"
date: "7/7/2017"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=FALSE, warning=FALSE)
library(xtable)
options(xtable.comment=FALSE, xtable.include.rownames=FALSE)
library(tidyverse)
```
```{r}
set.seed(1)
df <- data.frame(Year=rep(
sample(c(2012,2016), 10, replace = T)),
Group=sample(c('Treat','Control'),10,replace = T),
V1=runif(10,0,1),
V2=runif(10,0,1),
V3=runif(10,0,1))
```
```{r, results="asis"}
# Mean by Year and Group
summary.table = df %>%
group_by(Year, Group) %>%
summarise_all(funs(mean))
print(xtable(summary.table,
caption="Mean by Year and Group",
digits=c(1,0,0,2,2,2)))
```
```{r, results="asis"}
# Median by Year and Group
summary.table = df %>%
group_by(Year, Group) %>%
summarise_all(funs(median))
print(xtable(summary.table,
caption="Median by Year and Group",
digits=c(1,0,0,2,2,2)))
```
```{r, results="asis"}
# Mean and Median by Year and Group
summary.table = df %>%
group_by(Year, Group) %>%
summarise_all(funs(mean, median))
print(xtable(summary.table,
caption="Mean and Median by Year and Group",
digits=c(1,0,0,rep(2,6))))
```
```{r}
# Create a function that takes the summary function calls as arguments
xtab_fnc = function(data, caption, ...) {
# quosure to dispatch functions properly (see Programming with dplyr vignette)
funcs = quos(...)
summary.table = data %>%
group_by(Year, Group) %>%
summarise_all(funs(!!!funcs))
# Get rid of repeated years
for(i in nrow(summary.table):2) {
if(identical(summary.table$Year[i-1], summary.table$Year[i])) {
summary.table$Year[i] = ""
}
}
xtable(summary.table,
caption=caption,
digits=c(1,0,0,rep(2,ncol(summary.table) - 2)))
}
```
```{r, results="asis"}
# Run the function
print(xtab_fnc(df, "Mean, Median, and Sum by Year and Group", mean, median, sum),
size="scriptsize")
```
```{r, results="asis"}
# Run the function
print(xtab_fnc(df, "Mean by Year and Group", mean),
size="large")
```
```{r}
# You can do more complex formatting in xtable, but it's probably easier with kableExtra
library(knitr)
library(kableExtra)
```
```{r}
summary.table = df %>%
group_by(Year, Group) %>%
summarise_all(funs(mean,sd,min,median,max))
# Get rid of repeated years
for(i in nrow(summary.table):2) {
if(identical(summary.table$Year[i-1], summary.table$Year[i])) {
summary.table$Year[i] = ""
}
}
# Get rid of "_mean", and "_median" in column names
names(summary.table) = gsub("_.*","",names(summary.table))
# LaTeX Table
kable(summary.table, format = "latex",
booktabs = T, caption = "kableExtra to format spanning columns",
digits=c(0,0,rep(3,15))) %>%
kable_styling(latex_options = c("striped", "hold_position", "scale_down"),
full_width=F) %>%
add_header_above(c("","","Mean"=3,"SD"=3,"Min"=3,"Median[note]"=3,"Max"=3)) %>%
add_footnote(c("Note, means and medians are often the same with this data."))
```
```{r}
# Reshape table to turn V1-V3 into rows
summary.table = df %>%
group_by(Year, Group) %>%
summarise_all(funs(mean,sd,min,median,max)) %>%
gather(key, value, -Year, -Group) %>%
separate(key, into=c("var", "stat")) %>%
unite(stat_Group, stat, Group) %>%
spread(stat_Group, value)
# Get rid of repeated years
for(i in nrow(summary.table):2) {
if(identical(summary.table$Year[i-1], summary.table$Year[i])) {
summary.table$Year[i] = ""
}
}
names(summary.table) = gsub(".*_", "", names(summary.table))
# LaTeX Table
kable(summary.table, format = "latex",
booktabs = T, caption = "kableExtra to format spanning columns",
digits=c(0,0,rep(3,10))) %>%
kable_styling(latex_options = c("striped", "hold_position", "scale_down"),
full_width=F) %>%
add_header_above(c("","","Max"=2,"Mean"=2,"Median"=2,"Min"=2,"SD"=2))
```
```{r}
# Reshape table to turn V1-V3 into rows; also add a column summarizing all subjects
summary.table = df %>%
group_by(Year, Group) %>%
summarise_all(funs(mean,sd,min,median,max)) %>%
bind_rows(df %>% # bind_rows block adds the "All" column
mutate(Group="All") %>% # |
group_by(Year, Group) %>% # |
summarise_all(funs(mean,sd,min,median,max))) %>% # |
gather(key, value, -Year, -Group) %>%
separate(key, into=c("var", "stat")) %>%
unite(stat_Group, stat, Group) %>%
spread(stat_Group, value)
# Get rid of repeated years
for(i in nrow(summary.table):2) {
if(identical(summary.table$Year[i-1], summary.table$Year[i])) {
summary.table$Year[i] = ""
}
}
names(summary.table) = gsub(".*_", "", names(summary.table))
# LaTeX Table
kable(summary.table, format = "latex",
booktabs = T, caption = "kableExtra to format spanning columns; also add an 'All' column",
digits=c(0,0,rep(3,15))) %>%
kable_styling(latex_options = c("striped", "hold_position", "scale_down"),
full_width=F) %>%
add_header_above(c("","","Max"=3,"Mean"=3,"Median"=3,"Min"=3,"SD"=3))
```
Upvotes: 5