Display name
Display name

Reputation: 4501

How to create snapshots in time with R

library(tidyverse)

df <- tibble(`Action Item ID` = c("ABC", "EFG", "HIJ", "KLM", "NOP", "QRS"),
             `Date Created` = as.Date(c("2019-01-01", "2019-01-01", 
                                        "2019-06-01", "2019-06-01",
                                        "2019-08-01", "2019-08-01")),
             `Date Closed` = as.Date(c("2019-01-15", "2019-05-31", 
                                        "2019-06-15", "2019-07-05",
                                        "2019-08-15", NA)),
             `Current Status` = c(rep("Closed", 5), "Open"))

#> # A tibble: 6 x 4
#>   `Action Item ID` `Date Created` `Date Closed` `Current Status`
#>   <chr>            <date>         <date>        <chr>           
#> 1 ABC              2019-01-01     2019-01-15    Closed          
#> 2 EFG              2019-01-01     2019-05-20    Closed          
#> 3 HIJ              2019-06-01     2019-06-15    Closed          
#> 4 KLM              2019-06-01     2019-07-05    Closed          
#> 5 NOP              2019-08-01     2019-08-15    Closed          
#> 6 QRS              2019-08-01     NA            Open  

I'm trying to construct a line plot of open action items month to month, utilizing the data frame (tibble) shown above. Each month will be a snapshot in time (that no longer exists), on the last day of said month. Let's look at two action items to illustrate my problem.

On the last day of January at midnight (my 1st snapshot in time):

It's seems easier said than done. Perhaps I'm just inexperienced. What is the "Magic Code" I can write below that will:

  1. Strip all the months from my data frame to count in the fashion shown below.
  2. Fill in missing months, even if they're not there in my data frame, ie tidyr::complete, with appropriate values (even 0 when needed).
  3. Continuously count open action items month to month, until they're closed

Here's the results of "Magic Code", performed manually by me. Please note that everything has to be automated, I can't manually be changing month names month to month. Thank you.

df.months <- "Magic Code"

#> # A tibble: 6 x 4
#> `Month`       `Action Item Qty Open at End of Month` 
#> <date>         <integer>    
#> 2019-01-01     1
#> 2019-02-01     1
#> 2019-03-01     1
#> 2019-04-01     1
#> 2019-05-01     0
#> 2019-06-01     1
#> 2019-07-01     0
#> 2019-08-01     1

Upvotes: 1

Views: 233

Answers (1)

Jon Spring
Jon Spring

Reputation: 66825

Here's one approach. First reshape to longer form, then count Created as adding one and Closed as decreasing one. Then count monthly totals of those increments, and fill in missing months.

df %>%
  # convert to longer form, with one row for each Created or Closed
  pivot_longer(-c(`Action Item ID`, `Current Status`), "type", "date") %>%
  mutate(change = if_else(type == "Date Created", 1, -1)) %>%
  mutate(month = lubridate::floor_date(value, "month")) %>%
  arrange(value) %>%
  # get the sum of "change" for each month. Equivalent to:
  #    group_by(month) %>% summarize(n = sum(change) %>%
  count(month, wt = change) %>%
  # Add rows for any missing months in the series and replace NAs with 0's
  padr::pad() %>%
  replace_na(list(n=0)) %>%
  # Track cumulative change across all months to date
  mutate("Open at end of month" = cumsum(n))


## A tibble: 9 x 3
#  month          n `Open at end of month`
#  <date>     <dbl>                  <dbl>
#1 2019-01-01     1                      1
#2 2019-02-01     0                      1
#3 2019-03-01     0                      1
#4 2019-04-01     0                      1
#5 2019-05-01    -1                      0
#6 2019-06-01     1                      1
#7 2019-07-01    -1                      0
#8 2019-08-01     1                      1
#9 NA            -1                      0

Upvotes: 1

Related Questions