Nettle
Nettle

Reputation: 3321

Concatenating strings and calculated variables in table

I would like to identify the best process for producing summary text in a final report.

x <- tribble(
  ~year,       ~service,   ~account,     ~amount,
  "2001",       "Army",     "operations",  5000000,
  "2001",       "Navy",     "operations",  1500000,      
  "2002",       "Army",     "operations",  6000000,
  "2002",       "Navy",     "operations",  1700000,    
  "2001",       "Army",     "repair",       500000,
  "2001",       "Navy",     "repair",       300000,      
  "2002",       "Army",     "repair",       400000,
  "2002",       "Navy",     "repair",       600000)

Desired text, for each service.

"Between [year.min] and [year.max], the [service] 
spent an average of [average amount]. The largest account
in terms of spending within the [service] was [account], 
which ranked [rank] and fluctuated between [min amount]
and [max amount], with a high of [max amount] in [year] to
a low of [min] in [year]."

Desired Output would be in a table. The process would repeated at many sublevels (account, sub-account, etc).

service    summary_text              
  <chr>        <chr>                     
1 Army     concatenated 
2 Navy     concatenated 

Ultimately, I would like to export the result as an html table beside sparklines, which is fairly trivial in Excel.

service sparkline   summary_text              
  <chr>   <chr>      <chr>                     
1 Army    sparkline concatenated text 
2 Navy    sparkline concatenated text

Upvotes: 0

Views: 75

Answers (2)

Nettle
Nettle

Reputation: 3321

Moody Mudskipper's answer with sparklines.

library(tidyverse)
library(sparkline)
library(formattable)
library(glue)

#Data
x <- tribble(
  ~year,       ~service,   ~account,     ~amount,
  "2001",       "Army",     "operations",  5000000,
  "2001",       "Navy",     "operations",  1500000,      
  "2002",       "Army",     "operations",  6000000,
  "2002",       "Navy",     "operations",  1700000,    
  "2001",       "Army",     "repair",       500000,
  "2001",       "Navy",     "repair",       300000,      
  "2002",       "Army",     "repair",       400000,
  "2002",       "Navy",     "repair",       600000)


# Assemble Text
table <- x %>% 
  group_by(service, year) %>% 
  summarise(total = sum(amount)) %>% 
  group_by(service) %>% 
  summarise(mean_annual_service = mean(total),
            # years range
            first.year = min(year),
            last.year = max(year),
            # min and max years, amounts
            year.min= year[which.min(total)],
            year.max = year[which.max(total)],
            min.amount = total[which.min(total)],
            max.amount = total[which.max(total)]) %>% 
  # Final Text
  mutate(Description = glue('Between {first.year} and {last.year},
                        the average spending in the {service} was 
                        ${prettyNum(mean_annual_service, big.mark = ",")},
                        with a high of ${prettyNum(max.amount, big.mark = ",")} in {year.max}, and a low of
                        ${prettyNum(min.amount, big.mark = ",")} in {year.min}') ) %>% 
  select(service, Description)


# Add Sparkline
x %>% 
    group_by(service, year) %>%
    summarise(total = sum(amount)) %>% 
    summarise(
      Sparkline = spk_chr(
        total, 
        type = "line",
        chartRangeMin=min(total), 
        chartRangeMax=max(total))) %>% 
  left_join(table) %>% 
  formattable() %>% 
  as.htmlwidget() %>% 
  spk_add_deps()

Text and Sparklines

Upvotes: 0

moodymudskipper
moodymudskipper

Reputation: 47320

Using dplyr and glue with different strategies of grouping:

library(dplyr)
library(glue)
output <- x %>% 
  group_by(service,account) %>%
  mutate(amount_sum = sum(amount)) %>%
  group_by(service) %>%
  mutate(average.amount=mean(amount)) %>%
  filter(amount_sum == max(amount_sum)) %>%
  summarize(
    year.min=min(year),
    year.max=max(year),
    average.amount=first(average.amount),
    account=first(account),
    rank=1,
    min.amount =min(amount),
    max.amount=max(amount),
    year.min.amount = year[which.min(amount)],
    year.max.amount = year[which.max(amount)]) %>%
  transmute(service,
            summary_text= glue("Between {year.min} and {year.max}, the {service} 
                               spent an average of {average.amount}. The largest account
                               in terms of spending within the {service} was {account}, 
                               which ranked {rank} and fluctuated between {min.amount}
                               and {max.amount}, with a high of {max.amount} in {year.max.amount} to
                               a low of {min.amount} in {year.min.amount}."))

output %>% pull(summary_text)
# Between 2001 and 2002, the Army 
# spent an average of 2975000. The largest account
# in terms of spending within the Army was operations, 
# which ranked NA and fluctuated between 5e+06
# and 6e+06, with a high of 6e+06 in 2002 to
# a low of 5e+06 in 2001.
# Between 2001 and 2002, the Navy 
# spent an average of 1025000. The largest account
# in terms of spending within the Navy was operations, 
# which ranked NA and fluctuated between 1500000
# and 1700000, with a high of 1700000 in 2002 to
# a low of 1500000 in 2001.

You could use paste or sprintf instead of glue if you want to limit external library dependencies, but your example is more readable this way.

I assumed rank was always 1 in this example. If you want to deal with subaccounts I suggest you use the same trick as I did, before the summarize call use group_by and mutate, so you can create a new column constant by group. Then call first in summarize.

Upvotes: 4

Related Questions