Reputation: 4501
library(tidyverse)
df <- tibble(`Action Item ID` = c("ABC", "EFG", "HIJ", "KLM", "NOP", "QRS"),
`Date Created` = as.Date(c("2019-01-01", "2019-01-01",
"2019-06-01", "2019-06-01",
"2019-08-01", "2019-08-01")),
`Date Closed` = as.Date(c("2019-01-15", "2019-05-31",
"2019-06-15", "2019-07-05",
"2019-08-15", NA)),
`Current Status` = c(rep("Closed", 5), "Open"))
#> # A tibble: 6 x 4
#> `Action Item ID` `Date Created` `Date Closed` `Current Status`
#> <chr> <date> <date> <chr>
#> 1 ABC 2019-01-01 2019-01-15 Closed
#> 2 EFG 2019-01-01 2019-05-20 Closed
#> 3 HIJ 2019-06-01 2019-06-15 Closed
#> 4 KLM 2019-06-01 2019-07-05 Closed
#> 5 NOP 2019-08-01 2019-08-15 Closed
#> 6 QRS 2019-08-01 NA Open
I'm trying to construct a line plot of open action items month to month, utilizing the data frame (tibble) shown above. Each month will be a snapshot in time (that no longer exists), on the last day of said month. Let's look at two action items to illustrate my problem.
On the last day of January at midnight (my 1st snapshot in time):
ABC
was Closed, which matches it's current status as of today. EFG
was Open, however it's now closed with the correct Current Status == "Closed"
as of today. But I want to know that it was open back on January 31st, and be able to count all such occurrences.It's seems easier said than done. Perhaps I'm just inexperienced. What is the "Magic Code" I can write below that will:
tidyr::complete
, with appropriate values (even 0
when needed).Here's the results of "Magic Code", performed manually by me. Please note that everything has to be automated, I can't manually be changing month names month to month. Thank you.
df.months <- "Magic Code"
#> # A tibble: 6 x 4
#> `Month` `Action Item Qty Open at End of Month`
#> <date> <integer>
#> 2019-01-01 1
#> 2019-02-01 1
#> 2019-03-01 1
#> 2019-04-01 1
#> 2019-05-01 0
#> 2019-06-01 1
#> 2019-07-01 0
#> 2019-08-01 1
Upvotes: 1
Views: 233
Reputation: 66825
Here's one approach. First reshape to longer form, then count Created as adding one and Closed as decreasing one. Then count monthly totals of those increments, and fill in missing months.
df %>%
# convert to longer form, with one row for each Created or Closed
pivot_longer(-c(`Action Item ID`, `Current Status`), "type", "date") %>%
mutate(change = if_else(type == "Date Created", 1, -1)) %>%
mutate(month = lubridate::floor_date(value, "month")) %>%
arrange(value) %>%
# get the sum of "change" for each month. Equivalent to:
# group_by(month) %>% summarize(n = sum(change) %>%
count(month, wt = change) %>%
# Add rows for any missing months in the series and replace NAs with 0's
padr::pad() %>%
replace_na(list(n=0)) %>%
# Track cumulative change across all months to date
mutate("Open at end of month" = cumsum(n))
## A tibble: 9 x 3
# month n `Open at end of month`
# <date> <dbl> <dbl>
#1 2019-01-01 1 1
#2 2019-02-01 0 1
#3 2019-03-01 0 1
#4 2019-04-01 0 1
#5 2019-05-01 -1 0
#6 2019-06-01 1 1
#7 2019-07-01 -1 0
#8 2019-08-01 1 1
#9 NA -1 0
Upvotes: 1