Christian Conroy
Christian Conroy

Reputation: 5

Specific Summary Statistics for Multiple Variables by Factor Level

I am trying to get the mean, sd, min, max, and range for the mpg, price, weight, and repair record grouped by two factor levels (domestic and foreign) within a variable called foreign. I've come across many examples that show how to get one statistic like mean on multiple variables or how to get multiple statistics for one variable grouped by two factor levels. However, I haven't found anything particularly useful for developing the table that I've descibed above.

I've tried many things and it appears that ddply might be what I should be using. I think it should be something like ddply(df,[column I want to use as factor level], mean=mean(value),... but am unsure of the syntax. Thanks for any help!

Upvotes: 0

Views: 1107

Answers (1)

Kevin Arseneau
Kevin Arseneau

Reputation: 6264

I would favour a tidyverse approach, such as:

library(tibble)
library(dplyr)

mtcars %>%
  rownames_to_column() %>%
  as_tibble() %>%
  group_by(rowname) %>%
  summarise_all(
    funs(mean = mean, median = median, min = min, max = max, sd = sd)
  )

# # A tibble: 32 x 56
#              rowname mpg_mean cyl_mean disp_mean hp_mean drat_mean wt_mean qsec_mean
#                <chr>    <dbl>    <dbl>     <dbl>   <dbl>     <dbl>   <dbl>     <dbl>
# 1        AMC Javelin     15.2        8     304.0     150      3.15   3.435     17.30
# 2 Cadillac Fleetwood     10.4        8     472.0     205      2.93   5.250     17.98
# 3         Camaro Z28     13.3        8     350.0     245      3.73   3.840     15.41
# 4  Chrysler Imperial     14.7        8     440.0     230      3.23   5.345     17.42
# 5         Datsun 710     22.8        4     108.0      93      3.85   2.320     18.61
# 6   Dodge Challenger     15.5        8     318.0     150      2.76   3.520     16.87
# 7         Duster 360     14.3        8     360.0     245      3.21   3.570     15.84
# 8       Ferrari Dino     19.7        6     145.0     175      3.62   2.770     15.50
# 9           Fiat 128     32.4        4      78.7      66      4.08   2.200     19.47
# 10         Fiat X1-9     27.3        4      79.0      66      4.08   1.935     18.90

...or using summarise_if with the is.numeric predicate

library(dplyr)

starwars %>%
  group_by(homeworld) %>%
  summarise_if(
    is.numeric,
    funs(mean = mean, median = median, min = min, max = max, sd = sd)
  )

# # A tibble: 49 x 16
#        homeworld height_mean mass_mean birth_year_mean height_median mass_median birth_year_median height_min
#            <chr>       <dbl>     <dbl>           <dbl>         <dbl>       <dbl>             <dbl>      <dbl>
# 1       Alderaan    176.3333        NA              NA           188          NA                NA        150
# 2    Aleen Minor     79.0000      15.0              NA            79        15.0                NA         79
# 3         Bespin    175.0000      79.0              37           175        79.0                37        175
# 4     Bestine IV    180.0000     110.0              NA           180       110.0                NA        180
# 5 Cato Neimoidia    191.0000      90.0              NA           191        90.0                NA        191
# 6          Cerea    198.0000      82.0              92           198        82.0                92        198
# 7       Champala    196.0000        NA              NA           196          NA                NA        196
# 8      Chandrila    150.0000        NA              48           150          NA                48        150
# 9   Concord Dawn    183.0000      79.0              66           183        79.0                66        183
# 10      Corellia    175.0000      78.5              25           175        78.5                25        170

...you can always add arguments to the functions if necessary, such as na.rm like this mean(., na.rm = TRUE)

Upvotes: 1

Related Questions