Reputation: 1561
I need to repeat an operation many times for a different combinations of two different variables (trying to create data for stacked barplots showing percentage. Could anyone turn the code below into a function (of dataset, and the two variables x and y) in order to create the new data sets quickly? Or give me some good reference or link for learning about functions and dplyr. Thanks.
dat = df %>%
select(x, y) %>%
group_by(x, y) %>%
summarise(n = n()) %>%
mutate(percentage = round(n/sum(n)*100, 1)) %>%
ungroup() %>%
group_by(x) %>%
mutate(pos = cumsum(percentage) - (0.5 * percentage)) %>%
ungroup()
return(dat)
Upvotes: 1
Views: 61
Reputation: 3914
As suggested in the comments above, step-by-step explanations can be found here: dplyr.tidyverse.org/articles/programming.html
This guide will provide explanation of quo()
function and !!
symbols.
For your example you can create a function like so:
df1<- data.frame(x1 = c(rep(3,5), rep(7,2)),
y1 = c(rep(2,4), rep(5,3)))
my.summary <- function(df, x, y){
df %>%
select(!!x, !!y) %>%
group_by(!!x, !!y) %>%
summarise(n = n()) %>%
mutate(percentage = round(n/sum(n)*100, 1)) %>%
ungroup() %>%
group_by(!!x) %>%
mutate(pos = cumsum(percentage) - (0.5 * percentage)) %>%
ungroup()
}
my.summary(df1, quo(x1), quo(y1))
# # A tibble: 3 x 5
# x1 y1 n percentage pos
# <dbl> <dbl> <int> <dbl> <dbl>
# 1 3 2 4 80 40
# 2 3 5 1 20 90
# 3 7 5 2 100 50
Upvotes: 1