Filter rows based on multiple conditions using dplyr

Question

df <- data.frame(loc.id = rep(1:2,each = 10), threshold = rep(1:10,times = 2))

I want to filter out the first rows when threshold >= 2 and threshold is >= 4 for each loc.id. I did this:

df %>% group_by(loc.id) %>% dplyr::filter(row_number() == which.max(threshold >= 2),row_number() == which.max(threshold >= 4))

I expected a dataframe like this:

      loc.id threshold
        1       2
        1       4
        2       2
        2       4

But it returns me an empty dataframe

akrun · Accepted Answer

Based on the condition, we can slice the rows from concatenating the two which.max index, get the unique (if there are only cases where threshold is greater than 4, then both the conditions get the same index)

df %>%
    group_by(loc.id) %>%
    filter(any(threshold >= 2)) %>% # additional check
    #slice(unique(c(which.max(threshold > 2), which.max(threshold > 4))))
    # based on the expected output
    slice(unique(c(which.max(threshold >= 2), which.max(threshold >= 4))))
# A tibble: 4 x 2
# Groups:   loc.id [2]
#  loc.id threshold
#        
#1      1         2
#2      1         4
#3      2         2
#4      2         4

Note that there can be groups where there are no values in threshold greater than or equal to 2. We could keep only those groups

Filter rows based on multiple conditions using dplyr

Answers (2)

Related Questions