crm_analytics
crm_analytics

Reputation: 59

Summarize variables beside

I am looking for a solution for my problem. I just can solve it with manually rearranging.

Example code:

  library(dplyr)

    set.seed(1)
    Data <- data.frame(
      W = sample(1:10),
      X = sample(1:10),
      Y = sample(c("yes", "no"), 10, replace = TRUE),
      Z = sample(c("cat", "dog"), 10, replace = TRUE)
    )        
    #
    summarized <- Data %>% group_by(Z) %>% summarise_if(is.numeric,funs(mean,median),na.rm=T)

print(Data)

enter image description here

I want the output looks like below, with each function applied to the first col and then and each function applied to the second col and so on. My code does it vice versa.

Of course I could rearrange the cols but that is not what Data Science is about. I have hundreds of cols and want to apply multiple functions.

This is what I want:

summarized <- summarized[,c(1,2,4,3,5)] #best solution yet

enter image description here

Is there any argument I am missing? I bet there is an easy solution or an other function does the job. Guys, thx in advance!

Upvotes: 2

Views: 65

Answers (2)

akrun
akrun

Reputation: 887118

One option would be to post-process with adequate select_helpers

library(dplyr)
summarized %>% 
    select(Z, starts_with('W'), everything())
# A tibble: 2 x 5
#  Z     W_mean W_median X_mean X_median
#  <fct>  <dbl>    <dbl>  <dbl>    <dbl>
#1 cat     5.25      5.5   3.75      3.5
#2 dog     5.67      5.5   6.67      7  

If there are 100s of columns, one approach is to get the substring of the column names, and order

library(stringr)
summarized %>% 
         select(Z, order(str_remove(names(.), "_.*")))
# A tibble: 2 x 5
#  Z     W_mean W_median X_mean X_median
#  <fct>  <dbl>    <dbl>  <dbl>    <dbl>
#1 cat     5.25      5.5   3.75      3.5
#2 dog     5.67      5.5   6.67      7  

Upvotes: 4

Daniel D. Sjoberg
Daniel D. Sjoberg

Reputation: 11680

You can use starts_with() to select the columns, instead of by number.

library(dplyr)
set.seed(1)
Data <- data.frame(
  W = sample(1:10),
  X = sample(1:10),
  Y = sample(c("yes", "no"), 10, replace = TRUE),
  Z = sample(c("cat", "dog"), 10, replace = TRUE)
)        

summarized <- 
  Data %>% 
  group_by(Z) %>% 
  summarise_if(is.numeric,funs(mean,median),na.rm=T) %>%
  select(Z, starts_with("W_"), starts_with("X_"))

summarized
#> # A tibble: 2 x 5
#>   Z     W_mean W_median X_mean X_median
#>   <fct>  <dbl>    <dbl>  <dbl>    <dbl>
#> 1 cat     5.25      5.5   3.75      3.5
#> 2 dog     5.67      5.5   6.67      7

Created on 2019-12-09 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions