user9026
user9026

Reputation: 970

Summary statistics of numeric variables in data frame in specific format

I have a data frame and there are 3 numeric variables there. I need to calculate some parameters of these numeric variables like mean, median, std, kurtosis. And then I need to arrange this in a data frame. So, first column of this data frame will contain all numeric variable names and second column will contain all mean values, third column will contain all median values and so on. How can this be achieved ? I am familiar with dplyr package. So any suggestions ?

Upvotes: 1

Views: 1200

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389175

You can use summarise with across :

library(dplyr)
library(tidyr)

mtcars %>%
  select(1:3) %>%
  summarise(across(where(is.numeric), list(mean = mean, std = sd, med = median)))

#  mpg_mean  mpg_std mpg_med cyl_mean  cyl_std cyl_med disp_mean disp_std disp_med
#1 20.09062 6.026948    19.2   6.1875 1.785922       6  230.7219 123.9387    196.3

In the older version of dplyr, you can use summarise_if :

mtcars %>%
  select(1:3) %>%
  summarise_if(is.numeric, list(mean = mean, std = sd, med = median))

You can add pivot_longer to above answer to get data in required format.

mtcars %>%
  select(1:3) %>%
  summarise(across(where(is.numeric),list(mean=mean,std=sd,med = median))) %>%
  pivot_longer(cols = everything(), 
               names_to = c('col', '.value'), 
               names_sep = '_')


# A tibble: 3 x 4
#  col     mean    std   med
#  <chr>  <dbl>  <dbl> <dbl>
#1 mpg    20.1    6.03  19.2
#2 cyl     6.19   1.79   6  
#3 disp  231.   124.   196. 

Or you can first pivot and then do the calculation :

mtcars %>%
  select(1:3) %>%
  pivot_longer(cols = everything()) %>%
  group_by(name) %>%
  summarise(mean = mean(value), std = sd(value), med = median(value))

Upvotes: 3

Related Questions