chopin_is_the_best
chopin_is_the_best

Reputation: 2101

Filter out rows in dplyr adjusting dynamically to the current date (as Year Month)

I have a dataframe containing - for each monthly cohort of patients - the proportion of patients retained in the study after n months.

df <- tibble::tribble(
          ~cohort, ~months_elapsed, ~act_patients,
        "2020-06",               1,        100,
        "2020-05",               1,        100,
        "2020-05",               2,         80,
        "2020-05",               3,         20,
        "2020-04",               1,        100,
        "2020-04",               2,         80,
        "2020-04",               3,         50,
        "2020-04",               4,         10
        )

# A tibble: 8 x 3
  cohort  months_elapsed act_patients
  <chr>            <dbl>     <dbl>
1 2020-06              1       100
2 2020-05              1       100
3 2020-05              2        80
4 2020-05              3        20
5 2020-04              1       100
6 2020-04              2        80
7 2020-04              3        50
8 2020-04              4        10

Unfortunately, the dataframe contains inconsistent rows I want to clean out as follows:

Let's imagine we are at the end of the month of June 2020.

The final - clean - dataframe I want to have is:

df_clean <- tibble::tribble(
                ~cohort, ~months_elapsed, ~act_patients,
              "2020-06",               1,        100,
              "2020-05",               1,        100,
              "2020-05",               2,         80,
              "2020-04",               1,        100,
              "2020-04",               2,         80,
              "2020-04",               3,         50
              )

# A tibble: 6 x 3
  cohort  months_elapsed act_patients
  <chr>            <dbl>     <dbl>
1 2020-06              1       100
2 2020-05              1       100
3 2020-05              2        80
4 2020-04              1       100
5 2020-04              2        80
6 2020-04              3        50

I am looking for a possible solution in dplyr, any help is highly appreciated!!

Upvotes: 2

Views: 342

Answers (2)

Ben
Ben

Reputation: 30474

I believe this might be what you're looking for. First create a date column from your year-month cohort. Then, you can filter and select rows where the current date is past the cohort_date plus the number of months that have elapsed.

library(tidyverse)
library(lubridate)

df$cohort_date <- ymd(paste0(df$cohort, "-01"))

filter(df, Sys.Date() > cohort_date + (months(months_elapsed - 1)))

Output

# A tibble: 6 x 4
  cohort  months_elapsed act_patients cohort_date
  <chr>            <dbl>        <dbl> <date>     
1 2020-06              1          100 2020-06-01 
2 2020-05              1          100 2020-05-01 
3 2020-05              2           80 2020-05-01 
4 2020-04              1          100 2020-04-01 
5 2020-04              2           80 2020-04-01 
6 2020-04              3           50 2020-04-01

Upvotes: 1

Wimpel
Wimpel

Reputation: 27742

Here is my tidy solution, using %m+% from the lubridate-package.

library( tidyverse )
library( lubridate )

filter_date = as.Date( "2020-07-01" )

df %>% 
  filter( as.Date( paste0( cohort, "-01") ) %m+% months( months_elapsed ) <= filter_date )

# # A tibble: 6 x 3
#   cohort  months_elapsed act_patients
#   <chr>            <dbl>        <dbl>
# 1 2020-06              1          100
# 2 2020-05              1          100
# 3 2020-05              2           80
# 4 2020-04              1          100
# 5 2020-04              2           80
# 6 2020-04              3           50

Upvotes: 1

Related Questions