Reputation: 77
Trying to find the best way to do this. I am trying to output a table that only contains records that fulfills a condition of 2022 counts > 2021 counts. The table below is what I currently have from the original df. Not sure where to go next from here.
year | name | count |
---|---|---|
2021 | Bob | 22 |
2021 | Joe | 3 |
2021 | Meg | 4 |
2022 | Bob | 30 |
2022 | Joe | 1 |
2022 | Meg | 10 |
Resulting Table:
year | name | count |
---|---|---|
2022 | Bob | 30 |
2022 | Meg | 10 |
The script that I have so far that got me to the top df:
df %>%
group_by(year = format(as.Date(date, format = "%Y-%m-%d"), "%Y"), name) %>%
summarise(count = n()) %>%
filter(year == c("2022", "2021") )
Upvotes: 2
Views: 231
Reputation: 79214
Here is an alternative approach for your example:
library(dplyr)
df %>%
group_by(name) %>%
slice_max(count) %>%
filter(year == 2022)
year name count
<int> <chr> <int>
1 2022 Bob 30
2 2022 Meg 10
Upvotes: 1
Reputation: 887851
We may arrange
the data by 'name', 'year', then grouped by 'name', filter
the groups having the second count greater than the first
and the 'year' is 2022
library(dplyr)
df %>%
arrange(name, year) %>%
group_by(name) %>%
filter(nth(count, 2) > first(count), year == 2022) %>%
ungroup
-output
# A tibble: 2 × 3
year name count
<int> <chr> <int>
1 2022 Bob 30
2 2022 Meg 10
df <- structure(list(year = c(2021L, 2021L, 2021L, 2022L, 2022L, 2022L
), name = c("Bob", "Joe", "Meg", "Bob", "Joe", "Meg"), count = c(22L,
3L, 4L, 30L, 1L, 10L)), class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 3