Filter out rows in dplyr adjusting dynamically to the current date (as Year Month)

Question

I have a dataframe containing - for each monthly cohort of patients - the proportion of patients retained in the study after n months.

df <- tibble::tribble(
          ~cohort, ~months_elapsed, ~act_patients,
        "2020-06",               1,        100,
        "2020-05",               1,        100,
        "2020-05",               2,         80,
        "2020-05",               3,         20,
        "2020-04",               1,        100,
        "2020-04",               2,         80,
        "2020-04",               3,         50,
        "2020-04",               4,         10
        )

# A tibble: 8 x 3
  cohort  months_elapsed act_patients
                   
1 2020-06              1       100
2 2020-05              1       100
3 2020-05              2        80
4 2020-05              3        20
5 2020-04              1       100
6 2020-04              2        80
7 2020-04              3        50
8 2020-04              4        10

Unfortunately, the dataframe contains inconsistent rows I want to clean out as follows:

Let's imagine we are at the end of the month of June 2020.

The cohort 2020-06, after the current month has elapsed, has the totality (100%) of the patients active in the study.
The cohort of 2020-05, had 100% of the patients after a month elapsed, 80% of the patients after the second month elapsed. But here I have an extra row, containing the % of active patients of a month that didn't elapse yet (the 3rd month). Being at the end of June 2020, only 2 months have fully elapsed (May and June)
Same for the cohort of 2020-04, I have an extra row (the 4th month) I want to clean out: the cohort of April 2020, at the end of June, had just 3 possible months that elapsed (April, May, June).

The final - clean - dataframe I want to have is:

df_clean <- tibble::tribble(
                ~cohort, ~months_elapsed, ~act_patients,
              "2020-06",               1,        100,
              "2020-05",               1,        100,
              "2020-05",               2,         80,
              "2020-04",               1,        100,
              "2020-04",               2,         80,
              "2020-04",               3,         50
              )

# A tibble: 6 x 3
  cohort  months_elapsed act_patients
                   
1 2020-06              1       100
2 2020-05              1       100
3 2020-05              2        80
4 2020-04              1       100
5 2020-04              2        80
6 2020-04              3        50

I am looking for a possible solution in dplyr, any help is highly appreciated!!

Wimpel · Accepted Answer

Here is my tidy solution, using %m+% from the lubridate-package.

library( tidyverse )
library( lubridate )

filter_date = as.Date( "2020-07-01" )

df %>% 
  filter( as.Date( paste0( cohort, "-01") ) %m+% months( months_elapsed ) <= filter_date )

# # A tibble: 6 x 3
#   cohort  months_elapsed act_patients
#                       
# 1 2020-06              1          100
# 2 2020-05              1          100
# 3 2020-05              2           80
# 4 2020-04              1          100
# 5 2020-04              2           80
# 6 2020-04              3           50

Filter out rows in dplyr adjusting dynamically to the current date (as Year Month)

Answers (2)

Related Questions