Reputation: 814
I need to write a dplyr function that creates a customised area plot. So here's my attempt.
area_plot <- function(data, what, by){
by <- ensym(by)
what <- ensym(what)
data %>%
filter(!is.na(!!by)) %>%
group_by(date, !!by) %>%
summarise(!!what := sum(!!what, na.rm = TRUE)) %>%
complete(date, !!by, fill = rlang::list2(!!what := 0)) %>%
ggplot(aes(date, !!what, fill = !!by)) +
geom_area(position = 'stack') +
scale_x_date(breaks = '1 month', date_labels = '%Y-%m', expand = c(.01, .01)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = .4)) +
labs(fill = '')
}
But I've been wondering if there is any default value for by
argument that would output geom_area
plot for all groups together. I know that I can use if
to define data used in ggplot2
first and do something like this inside a function:
if (by != 'default') {
data <- data %>%
filter(!is.na(!!by)) %>%
group_by(date, !!by) %>%
summarise(!!what := sum(!!what, na.rm = TRUE)) %>%
complete(date, !!by, fill = rlang::list2(!!what := 0))}
ggplot(data, aes(date, !!what, fill = !!by)) +
geom_area(position = 'stack') +
scale_x_date(breaks = '1 month', date_labels = '%Y-%m', expand = c(.01, .01)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = .4)) +
labs(fill = '')
But I ponder if there's a neat trick to provide some value (eg. constant) to group_by
that would make summarise
preserving original structure (so basically, do nothing) despite being called. A behaviour similar to that when you provide a constant to some aesthetic in ggplot2
.
Please see the sample of the data attached. group
is an optional grouping variable.
structure(list(date = structure(c(17052, 17654, 17111, 17402,
17090, 17765, 17181, 17301, 17496, 17051, 16980, 17155, 17599,
16986, 17607, 17620, 17328, 17085, 17666, 17759, 17238, 16975,
17242, 17322, 17625, 17598, 17124, 17648, 17675, 17613, 17044,
16984, 16968, 17421, 17152, 17148, 17418, 17017, 17655, 17148,
16981, 17644, 17149, 17090, 17548, 17474, 17564, 17530, 17237,
17679, 17166, 17470, 17427, 17306, 17677, 17600, 17458, 17697,
17602, 16990, 17111, 17150, 17561, 17406, 17135, 17181, 17014,
17419, 17273, 17416, 17101, 17367, 17170, 17015, 17386, 17444,
17507, 17592, 17058, 17292, 16966, 17756, 17239, 17479, 17260,
17477, 16989, 17032, 17219, 17430, 17696, 17487, 17578, 17759,
17269, 17634, 17279, 17478, 17222, 17296), class = "Date"), count = c(2,
4, 2, 3, 6, 1, 4, 8, 1, 5, 1, 5, 1, 1, 2, 6, 3, 5, 2, 7, 3, 4,
1, 3, 4, 2, 4, 1, 2, 3, 16, 1, 5, 4, 3, 4, 4, 6, 1, 3, 3, 1,
3, 10, 5, 1, 4, 2, 2, 4, 5, 26, 4, 9, 3, 1, 3, 1, 4, 1, 2, 3,
1, 13, 3, 1, 3, 1, 1, 3, 1, 3, 3, 4, 1, 2, 2, 3, 1, 9, 3, 1,
2, 1, 4, 2, 1, 2, 4, 3, 2, 3, 1, 6, 5, 1, 2, 2, 3, 4), group = c("NON-FOOD",
NA, NA, NA, NA, "MIX", NA, NA, "MIX", NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, "FOOD", NA, "FOOD", NA, NA, "MIX",
NA, NA, NA, "FOOD", "FOOD", NA, NA, NA, NA, "FOOD", NA, NA, "FOOD",
NA, NA, NA, "FOOD", NA, NA, NA, NA, NA, NA, NA, NA, "MIX", NA,
NA, "FOOD", NA, "FOOD", NA, NA, "FOOD", NA, "FOOD", NA, NA, "NON-FOOD",
NA, NA, "MIX", "NON-FOOD", NA, NA, NA, NA, NA, NA, "IMAGE", NA,
"FOOD", NA, NA, NA, "FOOD", NA, "FOOD", NA, NA, NA, NA, NA, NA,
NA, NA, "FOOD", "FOOD", NA, NA, NA)), row.names = c(73008L, 535553L,
122359L, 321655L, 105632L, 646925L, 172409L, 256204L, 394666L,
72385L, 20180L, 156162L, 478525L, 91409L, 485397L, 501386L, 277336L,
100902L, 549629L, 640676L, 209400L, 16603L, 224543L, 272638L,
505291L, 475497L, 131845L, 529041L, 558295L, 491746L, 67156L,
23499L, 11150L, 334454L, 154958L, 150674L, 333348L, 45599L, 536064L,
150673L, 20668L, 524095L, 151809L, 105713L, 433853L, 375687L,
445626L, 420587L, 208594L, 562514L, 162403L, 372594L, 338509L,
259784L, 560356L, 480072L, 361471L, 579474L, 481262L, 26469L,
122119L, 152537L, 443426L, 325045L, 140531L, 171908L, 43547L,
333968L, 237152L, 332106L, 114754L, 298081L, 164923L, 43577L,
311250L, 350267L, 404348L, 470188L, 78329L, 250086L, 9486L, 638289L,
209638L, 379370L, 227299L, 377487L, 26333L, 55058L, 195261L,
340666L, 578515L, 387600L, 457752L, 640729L, 235389L, 514348L,
240303L, 378836L, 197409L, 252746L), class = "data.frame")
Upvotes: 7
Views: 1219
Reputation: 8870
for a one-line solution, you could use a combination of across()
and any_of()
, together with as_label()
(and enquo
if you use it within a function):
library(dplyr, warn.conflicts = FALSE)
group_maybe <- function(df, by=NULL){
df %>%
group_by(across(any_of(as_label(enquo(by)))))
}
group_maybe(iris, by = Species)
#> # A tibble: 150 × 5
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> # ℹ 148 more rows
group_maybe(iris)
#> # A tibble: 150 × 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> # ℹ 148 more rows
Created on 2024-06-18 with reprex v2.1.0
Upvotes: 0
Reputation: 16832
Here's one way to do the first few steps of your function (I didn't go into all the ggplot
stuff, just how you could approach grouping). In general, to set a default "do nothing" action, such as default to not grouping, you'll use argument = NULL
in your function--you can look around at other functions' doc pages to see how this is done. Here's an SO post on the difference between NA
and NULL
.
I'm not super adept at working with quosures, but I've built a few functions and often rely on some rlang
/tidyselect
helper functions, such as rlang::quo_is_null
that I'm using here. Someone else may be able to rewrite this without helpers.
First to see the behavior you're looking for, as the grouped or not grouped summaries:
library(tidyverse)
# grouped
df %>%
filter(!is.na(group)) %>%
group_by(group) %>%
summarise(count = sum(count, na.rm = TRUE))
#> # A tibble: 4 x 2
#> group count
#> <chr> <dbl>
#> 1 FOOD 34
#> 2 IMAGE 1
#> 3 MIX 8
#> 4 NON-FOOD 6
# not grouped
df %>%
# add in if you want to filter ungrouped data
summarise(count = sum(count, na.rm = TRUE))
#> count
#> 1 347
Then in the function, I create what_var
as the quosure version of what
(rlang experts, feel free to correct me on this terminology...?). I generally add _var
to names to keep track of what's the original argument and what's been enquo
ed already. Check for whether the argument by
is null by creating a quosure of by
and checking whether that is null. If it's not null, i.e. if some column name was supplied for by
, filter and group by that quosure. If it is null, just pass along the original data frame. I pass the data to a new variable in the else
statement to avoid operating on the original data frame. Then, regardless of whether the data is grouped, summarize what
.
to_group_or_not_to_group <- function(data, what, by = NULL) {
what_var <- enquo(what)
if(!rlang::quo_is_null(enquo(by))) {
by_var <- enquo(by)
grouped_or_not <- data %>%
filter(!is.na(!!by_var)) %>%
group_by(!!by_var)
} else {
grouped_or_not <- data
}
grouped_or_not %>%
summarise(!!quo_name(what_var) := sum(!!what_var, na.rm = TRUE))
}
Verify that you got your targeted results. With a grouping variable:
df %>%
to_group_or_not_to_group(what = count, by = group)
#> # A tibble: 4 x 2
#> group count
#> <chr> <dbl>
#> 1 FOOD 34
#> 2 IMAGE 1
#> 3 MIX 8
#> 4 NON-FOOD 6
Supplying NULL
as the (absence of) grouping variable:
df %>%
to_group_or_not_to_group(what = count, by = NULL)
#> count
#> 1 347
Without a grouping variable, falling back on the default by = NULL
:
df %>%
to_group_or_not_to_group(what = count)
#> count
#> 1 347
Created on 2018-10-16 by the reprex package (v0.2.1)
Upvotes: 8