Reputation: 3321
I would like to identify the best process for producing summary text in a final report.
x <- tribble(
~year, ~service, ~account, ~amount,
"2001", "Army", "operations", 5000000,
"2001", "Navy", "operations", 1500000,
"2002", "Army", "operations", 6000000,
"2002", "Navy", "operations", 1700000,
"2001", "Army", "repair", 500000,
"2001", "Navy", "repair", 300000,
"2002", "Army", "repair", 400000,
"2002", "Navy", "repair", 600000)
Desired text, for each service.
"Between [year.min] and [year.max], the [service]
spent an average of [average amount]. The largest account
in terms of spending within the [service] was [account],
which ranked [rank] and fluctuated between [min amount]
and [max amount], with a high of [max amount] in [year] to
a low of [min] in [year]."
Desired Output would be in a table. The process would repeated at many sublevels (account, sub-account, etc).
service summary_text
<chr> <chr>
1 Army concatenated
2 Navy concatenated
Ultimately, I would like to export the result as an html table beside sparklines, which is fairly trivial in Excel.
service sparkline summary_text
<chr> <chr> <chr>
1 Army sparkline concatenated text
2 Navy sparkline concatenated text
Upvotes: 0
Views: 75
Reputation: 3321
Moody Mudskipper's answer with sparklines.
library(tidyverse)
library(sparkline)
library(formattable)
library(glue)
#Data
x <- tribble(
~year, ~service, ~account, ~amount,
"2001", "Army", "operations", 5000000,
"2001", "Navy", "operations", 1500000,
"2002", "Army", "operations", 6000000,
"2002", "Navy", "operations", 1700000,
"2001", "Army", "repair", 500000,
"2001", "Navy", "repair", 300000,
"2002", "Army", "repair", 400000,
"2002", "Navy", "repair", 600000)
# Assemble Text
table <- x %>%
group_by(service, year) %>%
summarise(total = sum(amount)) %>%
group_by(service) %>%
summarise(mean_annual_service = mean(total),
# years range
first.year = min(year),
last.year = max(year),
# min and max years, amounts
year.min= year[which.min(total)],
year.max = year[which.max(total)],
min.amount = total[which.min(total)],
max.amount = total[which.max(total)]) %>%
# Final Text
mutate(Description = glue('Between {first.year} and {last.year},
the average spending in the {service} was
${prettyNum(mean_annual_service, big.mark = ",")},
with a high of ${prettyNum(max.amount, big.mark = ",")} in {year.max}, and a low of
${prettyNum(min.amount, big.mark = ",")} in {year.min}') ) %>%
select(service, Description)
# Add Sparkline
x %>%
group_by(service, year) %>%
summarise(total = sum(amount)) %>%
summarise(
Sparkline = spk_chr(
total,
type = "line",
chartRangeMin=min(total),
chartRangeMax=max(total))) %>%
left_join(table) %>%
formattable() %>%
as.htmlwidget() %>%
spk_add_deps()
Upvotes: 0
Reputation: 47320
Using dplyr
and glue
with different strategies of grouping:
library(dplyr)
library(glue)
output <- x %>%
group_by(service,account) %>%
mutate(amount_sum = sum(amount)) %>%
group_by(service) %>%
mutate(average.amount=mean(amount)) %>%
filter(amount_sum == max(amount_sum)) %>%
summarize(
year.min=min(year),
year.max=max(year),
average.amount=first(average.amount),
account=first(account),
rank=1,
min.amount =min(amount),
max.amount=max(amount),
year.min.amount = year[which.min(amount)],
year.max.amount = year[which.max(amount)]) %>%
transmute(service,
summary_text= glue("Between {year.min} and {year.max}, the {service}
spent an average of {average.amount}. The largest account
in terms of spending within the {service} was {account},
which ranked {rank} and fluctuated between {min.amount}
and {max.amount}, with a high of {max.amount} in {year.max.amount} to
a low of {min.amount} in {year.min.amount}."))
output %>% pull(summary_text)
# Between 2001 and 2002, the Army
# spent an average of 2975000. The largest account
# in terms of spending within the Army was operations,
# which ranked NA and fluctuated between 5e+06
# and 6e+06, with a high of 6e+06 in 2002 to
# a low of 5e+06 in 2001.
# Between 2001 and 2002, the Navy
# spent an average of 1025000. The largest account
# in terms of spending within the Navy was operations,
# which ranked NA and fluctuated between 1500000
# and 1700000, with a high of 1700000 in 2002 to
# a low of 1500000 in 2001.
You could use paste
or sprintf
instead of glue
if you want to limit external library dependencies, but your example is more readable this way.
I assumed rank
was always 1
in this example. If you want to deal with subaccounts I suggest you use the same trick as I did, before the summarize
call use group_by
and mutate
, so you can create a new column constant by group. Then call first
in summarize
.
Upvotes: 4