Reputation: 3815
df <- data.frame(loc.id = rep(1:2,each = 10), threshold = rep(1:10,times = 2))
I want to filter out the first rows when threshold
>= 2 and threshold
is >= 4 for each loc.id
. I did this:
df %>% group_by(loc.id) %>% dplyr::filter(row_number() == which.max(threshold >= 2),row_number() == which.max(threshold >= 4))
I expected a dataframe like this:
loc.id threshold
1 2
1 4
2 2
2 4
But it returns me an empty dataframe
Upvotes: 1
Views: 1079
Reputation: 3321
If this isn't what you want, assign the df below a name and use it to filter your dataset.
df %>%
distinct() %>%
filter(threshold ==2 | threshold==4)
#> loc.id threshold
#> 1 1 2
#> 2 1 4
#> 3 2 2
#> 4 2 4
```
Upvotes: 1
Reputation: 887981
Based on the condition, we can slice
the rows from concatenating the two which.max
index, get the unique
(if there are only cases where threshold is greater than 4, then both the conditions get the same index)
df %>%
group_by(loc.id) %>%
filter(any(threshold >= 2)) %>% # additional check
#slice(unique(c(which.max(threshold > 2), which.max(threshold > 4))))
# based on the expected output
slice(unique(c(which.max(threshold >= 2), which.max(threshold >= 4))))
# A tibble: 4 x 2
# Groups: loc.id [2]
# loc.id threshold
# <int> <int>
#1 1 2
#2 1 4
#3 2 2
#4 2 4
Note that there can be groups where there are no values in threshold greater than or equal to 2. We could keep only those groups
Upvotes: 2