BeginnerProgrammer
BeginnerProgrammer

Reputation: 77

Filtering in dplyr with multiple conditions

Trying to find the best way to do this. I am trying to output a table that only contains records that fulfills a condition of 2022 counts > 2021 counts. The table below is what I currently have from the original df. Not sure where to go next from here.

year name count
2021 Bob 22
2021 Joe 3
2021 Meg 4
2022 Bob 30
2022 Joe 1
2022 Meg 10

Resulting Table:

year name count
2022 Bob 30
2022 Meg 10

The script that I have so far that got me to the top df:

df %>% 
  group_by(year = format(as.Date(date, format = "%Y-%m-%d"), "%Y"), name) %>%
  summarise(count = n()) %>%
  filter(year == c("2022", "2021") ) 

Upvotes: 2

Views: 231

Answers (2)

TarJae
TarJae

Reputation: 79214

Here is an alternative approach for your example:

library(dplyr)

df %>% 
  group_by(name) %>% 
  slice_max(count) %>% 
  filter(year == 2022)
   year name  count
  <int> <chr> <int>
1  2022 Bob      30
2  2022 Meg      10

Upvotes: 1

akrun
akrun

Reputation: 887851

We may arrange the data by 'name', 'year', then grouped by 'name', filter the groups having the second count greater than the first and the 'year' is 2022

library(dplyr)
df %>% 
  arrange(name, year) %>% 
  group_by(name) %>%
  filter(nth(count, 2) > first(count), year == 2022) %>%
  ungroup

-output

# A tibble: 2 × 3
   year name  count
  <int> <chr> <int>
1  2022 Bob      30
2  2022 Meg      10

data

df <- structure(list(year = c(2021L, 2021L, 2021L, 2022L, 2022L, 2022L
), name = c("Bob", "Joe", "Meg", "Bob", "Joe", "Meg"), count = c(22L, 
3L, 4L, 30L, 1L, 10L)), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 3

Related Questions