Kaizen502
Kaizen502

Reputation: 35

How to create a summary statistics table with two groups using stargazer?

I have been looking for hours on how to create a summary statistics table grouped by a categorical variable in R with the stargazer package.

Basically, I want to display the means for two groups (control & treatment) next to each other and additionally calculate the differences between both groups.

Whenever I try to create the table with stargazer it creates both tables for each categorical variable underneath each other.

I created a sample with the mtcars data set. Assuming the variable 'am' is the categorical variable:

attach(mtcars)
library(dplyr)
data = mtcars

auto1 = data %>%
  filter(am == 1) %>%
  dplyr::select(mpg,disp,hp)

manu1 = data %>%
  filter(am == 0) %>%
  dplyr::select(mpg,disp,hp)

stargazer(auto1,manu1, type = "html", out = "summary.html",summary.stat = c("mean"), summary = TRUE)`

Since that did not work out as expected, I created the summary table manually and specified summary to FALSE inside stargazer to just obtain a a HTML table:

auto = data %>%
  filter(am == 1) %>%
  summarize_each(funs(mean)) %>%
  melt(id.vars="am")

manu = data %>%
  filter(am == 0) %>%
  summarize_each(funs(mean)) %>%
  melt(id.vars = "am")

end = dplyr::select(data.frame(auto,manu),-c(am,am.1,variable.1))
end$diff = end$value.1 - end$value
names(end) = c("Variable","Automatic","Manual","Difference")        

stargazer(end, type = "html", out = "summary.html",summary.stat = c("mean"), summary = FALSE)

This is probably not really a neat way of creating the desired summary statistics table, but I couldn't figure out a better way myself. Any suggestions how that could work with stargazer or a different package?

Upvotes: 1

Views: 5073

Answers (1)

jamieRowen
jamieRowen

Reputation: 1549

Not entirely sure what your desired output is but does this help?

mtcars %>% 
  group_by(am) %>%
  summarise(mpg = mean(mpg), disp = mean(disp), hp = mean(hp)) %>%
  gather(key = "variable","value",mpg,disp,hp) %>%
  spread(am,value) %>%
  group_by(variable) %>%
  mutate(difference = `1`-`0`)

## Source: local data frame [3 x 4]
## Groups: variable [3]
##
##   variable       `0`       `1`  difference
##      <chr>     <dbl>     <dbl>       <dbl>
## 1     disp 290.37895 143.53077 -146.848178
## 2       hp 160.26316 126.84615  -33.417004
## 3      mpg  17.14737  24.39231    7.244939

Upvotes: 3

Related Questions