winlai
winlai

Reputation: 91

Partially filter table if it meets certain conditions using dplyr

I am using dplyr to try and partially filter a data frame, but only if a variable meets a certain condition. Here is a sample code of the data frame I am trying to filter:

tbl <- tribble(
  ~account, ~date, ~id, ~report_start, ~report_end,
  "a", as.Date('2022-09-03'), NA, NA, NA,
  "b", as.Date('2022-09-03'), NA, NA, NA,
  "c", as.Date('2022-09-03'), NA, NA, NA,
  "c", as.Date('2022-09-03'), NA, NA, NA,
  "e", as.Date('2022-09-03'), 1, as.Date('2022-07-04'), as.Date('2022-10-02'),
  "e", as.Date('2022-09-03'), 1, as.Date('2022-10-03'), as.Date('2022-12-15'),
  "e", as.Date('2022-09-03'), 2, as.Date('2022-03-28'), as.Date('2022-07-03'),
  "f", as.Date('2022-09-03'), NA, NA, NA,
  "g", as.Date('2022-09-03'), 2, as.Date('2022-07-04'), as.Date('2022-10-02'),
  "g", as.Date('2022-09-03'), 3, as.Date('2022-07-04'), as.Date('2022-12-15'),
)

and this is what the sample table looks like:

enter image description here

What I am trying to do is filter the table so that any rows where id is NA then I would like to keep them unchanged. But for rows where id is not NA, I would like to filter the table and only keep the rows where date >= report_start and <= report_end.

Ideally the table will filter down to look like this, where accounts that are NA for id remain untouched in the table, while only filtering the accounts where there was an id number for date between report_start and report_end:

enter image description here

I've tried using the following code to try and filter the table, but i end up filtering out all my rows where id is NA:

df <- tbl %>%
  filter(!is.na(id), (date >= report_start & date <= report_end))

I've tried other dplyr methods but cant seem to get it to work so any assistance would be much appreciated.

Upvotes: 1

Views: 29

Answers (1)

Harrison Jones
Harrison Jones

Reputation: 2506

Your attempt was close and had the right idea, but using a comma in filter is the equivalent of an "AND" clause which I don't think gave you all of what you needed. You also want to specify the rows of NAs that you want to keep using |. Try this instead

tbl %>%
  filter(is.na(id) | (date >= report_start & date <= report_end))

Upvotes: 0

Related Questions