Trying to find the best way to do this. I am trying to output a table that only contains records that fulfills a condition of 2022 counts > 2021 counts. The table below is what I currently have from the original df. Not sure where to go next from here. year name count 2021 Bob 22 2021 Joe 3 2021 Meg 4 2022 Bob 30 2022 Joe 1 2022 Meg 10 Resulting Table: year name count 2022 Bob 30 2022 Meg 10 The script that I have so far that got me to the top df: df %>% group_by(year = format(as.Date(date, format = "%Y-%m-%d"), "%Y"), name) %>% summarise(count = n()) %>% filter(year == c("2022", "2021") )

Reputation: 77

Filtering in dplyr with multiple conditions

Trying to find the best way to do this. I am trying to output a table that only contains records that fulfills a condition of 2022 counts > 2021 counts. The table below is what I currently have from the original df. Not sure where to go next from here.

year	name	count
2022	Bob	30
2022	Meg	10

Resulting Table:

year	name	count
2022	Bob	30
2022	Meg	10

The script that I have so far that got me to the top df:

df %>% 
  group_by(year = format(as.Date(date, format = "%Y-%m-%d"), "%Y"), name) %>%
  summarise(count = n()) %>%
  filter(year == c("2022", "2021") )

Upvotes: 2

Answers (2)

TarJae

Reputation: 79246

Here is an alternative approach for your example:

library(dplyr)

df %>% 
  group_by(name) %>% 
  slice_max(count) %>% 
  filter(year == 2022)

   year name  count
  <int> <chr> <int>
1  2022 Bob      30
2  2022 Meg      10

Upvotes: 1

akrun

Reputation: 887901

We may arrange the data by 'name', 'year', then grouped by 'name', filter the groups having the second count greater than the first and the 'year' is 2022

library(dplyr)
df %>% 
  arrange(name, year) %>% 
  group_by(name) %>%
  filter(nth(count, 2) > first(count), year == 2022) %>%
  ungroup

-output

# A tibble: 2 × 3
   year name  count
  <int> <chr> <int>
1  2022 Bob      30
2  2022 Meg      10

data

df <- structure(list(year = c(2021L, 2021L, 2021L, 2022L, 2022L, 2022L
), name = c("Bob", "Joe", "Meg", "Bob", "Joe", "Meg"), count = c(22L, 
3L, 4L, 30L, 1L, 10L)), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 3

Filtering in dplyr with multiple conditions

Answers (2)

data

Related Questions