Reputation: 47
Using R's summary(), I want to make a table that has means, std, n, min, and max for multiple variables. I will use mtcars as a dataset (R's default dataset). If just one variable, this worked well:
as.data.frame(t(unclass(summary(mtcars$disp))))
The result:
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 71.1 120.825 196.3 230.7219 326 472
If more than one, it doesn't work well. I'm getting the same result as above (only the result for mtcars$disp shows).
as.data.frame(t(unclass(summary(mtcars$disp,mtcars$hp,mtcars$drat))))
The result (the same as above): Min. 1st Qu. Median Mean 3rd Qu. Max. 1 71.1 120.825 196.3 230.7219 326 472
The ideal result should look like this.
Min. 1st Qu. Median Mean 3rd Qu. Max.
71.1 120.825 196.3 230.7219 326 472
52 96.5 123 146.6875 180 335
2.76 3.08 3.695 3.596563 3.92 4.93
I would like the name of variables too:
Name Min. 1st Qu. Median Mean 3rd Qu. Max.
disp 71.1 120.825 196.3 230.7219 326 472
hp 52 96.5 123 146.6875 180 335
drat 2.76 3.08 3.695 3.596563 3.92 4.93
Could you advise? Also in the last code, I have to repeat $mtcars many times. Is there a way to avoid this?
Thank you.
I ask a similar question here, but the suggested codes are getting very complicated. I'd like to stick with summary() if possible. R question: how to save summary results into a dataset
Upvotes: 2
Views: 2073
Reputation: 388982
You could sapply
over the columns and get summary
for each
cols <- c("disp", "hp", "drat")
t(sapply(mtcars[cols], summary))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#disp 71.10 120.825 196.300 230.721875 326.00 472.00
#hp 52.00 96.500 123.000 146.687500 180.00 335.00
#drat 2.76 3.080 3.695 3.596563 3.92 4.93
If you also need the names in a separate column
summary_df <- data.frame(t(sapply(mtcars[cols], summary)), check.names = FALSE)
summary_df$Name <- rownames(summary_df)
rownames(summary_df) <- NULL
summary_df
# Min. 1st Qu. Median Mean 3rd Qu. Max. Name
#1 71.10 120.825 196.300 230.721875 326.00 472.00 disp
#2 52.00 96.500 123.000 146.687500 180.00 335.00 hp
#3 2.76 3.080 3.695 3.596563 3.92 4.93 drat
To add some additional statistics, we need to create a custom function
custom_summary <- function(x) {
c(summary(x), length = length(x), nonmissing = sum(!is.na(x)),
sd = sd(x, na.rm = TRUE))
}
t(sapply(mtcars[cols], custom_summary))
# Min. 1st Qu. Median Mean 3rd Qu. Max. length nonmissing sd
#disp 71.10 120.825 196.300 230.721875 326.00 472.00 32 32 123.9386938
#hp 52.00 96.500 123.000 146.687500 180.00 335.00 32 32 68.5628685
#drat 2.76 3.080 3.695 3.596563 3.92 4.93 32 32 0.5346787
Upvotes: 3
Reputation: 1114
You can use dplyr
and summarise()
, which will output a tidy tibble/data.frame and and you can easily specify which summary stats you want.
mtcars %>% select(disp,hp,drat) %>%
gather(k,v) %>% group_by(k) %>%
summarise(min=min(v),median=median(v),mean=mean(v),max=max(v),n=n())
# A tibble: 3 x 6
k min median mean max n
<chr> <dbl> <dbl> <dbl> <dbl> <int>
1 disp 71.1 196. 231. 472 32
2 drat 2.76 3.70 3.60 4.93 32
3 hp 52 123 147. 335 32
Upvotes: 3