kuekawa
kuekawa

Reputation: 47

How to turn the result of summary() into a nice looking data table

Using R's summary(), I want to make a table that has means, std, n, min, and max for multiple variables. I will use mtcars as a dataset (R's default dataset). If just one variable, this worked well:

as.data.frame(t(unclass(summary(mtcars$disp))))

The result:

Min. 1st Qu. Median     Mean 3rd Qu. Max.
1 71.1 120.825  196.3 230.7219     326  472

If more than one, it doesn't work well. I'm getting the same result as above (only the result for mtcars$disp shows).

as.data.frame(t(unclass(summary(mtcars$disp,mtcars$hp,mtcars$drat))))

The result (the same as above): Min. 1st Qu. Median Mean 3rd Qu. Max. 1 71.1 120.825 196.3 230.7219 326 472

The ideal result should look like this.

Min. 1st Qu. Median     Mean 3rd Qu. Max.
71.1 120.825  196.3 230.7219     326  472
52    96.5    123 146.6875     180  335
2.76    3.08  3.695 3.596563    3.92 4.93

I would like the name of variables too:

Name  Min. 1st Qu. Median     Mean 3rd Qu. Max.
disp  71.1 120.825  196.3 230.7219     326  472
hp    52    96.5    123 146.6875     180  335
drat  2.76    3.08  3.695 3.596563    3.92 4.93

Could you advise? Also in the last code, I have to repeat $mtcars many times. Is there a way to avoid this?

Thank you.

I ask a similar question here, but the suggested codes are getting very complicated. I'd like to stick with summary() if possible. R question: how to save summary results into a dataset

Upvotes: 2

Views: 2073

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

You could sapply over the columns and get summary for each

cols <- c("disp", "hp", "drat")
t(sapply(mtcars[cols], summary))

#      Min. 1st Qu.  Median       Mean 3rd Qu.   Max.
#disp 71.10 120.825 196.300 230.721875  326.00 472.00
#hp   52.00  96.500 123.000 146.687500  180.00 335.00
#drat  2.76   3.080   3.695   3.596563    3.92   4.93

If you also need the names in a separate column

summary_df <- data.frame(t(sapply(mtcars[cols], summary)), check.names = FALSE)
summary_df$Name <- rownames(summary_df)
rownames(summary_df) <- NULL

summary_df
#   Min. 1st Qu.  Median       Mean 3rd Qu.   Max. Name
#1 71.10 120.825 196.300 230.721875  326.00 472.00 disp
#2 52.00  96.500 123.000 146.687500  180.00 335.00   hp
#3  2.76   3.080   3.695   3.596563    3.92   4.93 drat

To add some additional statistics, we need to create a custom function

custom_summary <- function(x) {
  c(summary(x), length = length(x), nonmissing = sum(!is.na(x)), 
                sd = sd(x, na.rm = TRUE))
}
t(sapply(mtcars[cols], custom_summary))

#      Min. 1st Qu.  Median       Mean 3rd Qu.   Max. length nonmissing          sd
#disp 71.10 120.825 196.300 230.721875  326.00 472.00     32         32 123.9386938
#hp   52.00  96.500 123.000 146.687500  180.00 335.00     32         32  68.5628685
#drat  2.76   3.080   3.695   3.596563    3.92   4.93     32         32   0.5346787

Upvotes: 3

kstew
kstew

Reputation: 1114

You can use dplyr and summarise(), which will output a tidy tibble/data.frame and and you can easily specify which summary stats you want.

mtcars %>% select(disp,hp,drat) %>% 
  gather(k,v) %>% group_by(k) %>% 
  summarise(min=min(v),median=median(v),mean=mean(v),max=max(v),n=n())

# A tibble: 3 x 6
  k       min median   mean    max     n
  <chr> <dbl>  <dbl>  <dbl>  <dbl> <int>
1 disp  71.1  196.   231.   472       32
2 drat   2.76   3.70   3.60   4.93    32
3 hp    52    123    147.   335       32

Upvotes: 3

Related Questions