Benjamin Telkamp
Benjamin Telkamp

Reputation: 1561

Create custom dplyr data transformation function in R

I need to repeat an operation many times for a different combinations of two different variables (trying to create data for stacked barplots showing percentage. Could anyone turn the code below into a function (of dataset, and the two variables x and y) in order to create the new data sets quickly? Or give me some good reference or link for learning about functions and dplyr. Thanks.

 dat = df %>% 
   select(x, y) %>% 
   group_by(x, y) %>% 
   summarise(n = n()) %>% 
   mutate(percentage = round(n/sum(n)*100, 1)) %>% 
   ungroup() %>% 
   group_by(x) %>% 
   mutate(pos = cumsum(percentage) - (0.5 * percentage)) %>% 
   ungroup()  
   return(dat)

Upvotes: 1

Views: 61

Answers (1)

Katia
Katia

Reputation: 3914

As suggested in the comments above, step-by-step explanations can be found here: dplyr.tidyverse.org/articles/programming.html This guide will provide explanation of quo() function and !! symbols.

For your example you can create a function like so:

df1<- data.frame(x1 = c(rep(3,5), rep(7,2)), 
                y1 = c(rep(2,4), rep(5,3)))

my.summary <- function(df, x, y){
  df %>% 
    select(!!x, !!y) %>% 
    group_by(!!x, !!y) %>% 
    summarise(n = n()) %>%
    mutate(percentage = round(n/sum(n)*100, 1)) %>% 
    ungroup() %>% 
    group_by(!!x) %>% 
    mutate(pos = cumsum(percentage) - (0.5 * percentage)) %>% 
    ungroup() 
}

my.summary(df1, quo(x1), quo(y1))

# # A tibble: 3 x 5
#    x1    y1     n percentage   pos
# <dbl> <dbl> <int>      <dbl> <dbl>
#   1     3     2     4         80    40
#   2     3     5     1         20    90
#   3     7     5     2        100    50

Upvotes: 1

Related Questions