89_Simple
89_Simple

Reputation: 3815

Filter rows based on multiple conditions using dplyr

df <- data.frame(loc.id = rep(1:2,each = 10), threshold = rep(1:10,times = 2))

I want to filter out the first rows when threshold >= 2 and threshold is >= 4 for each loc.id. I did this:

df %>% group_by(loc.id) %>% dplyr::filter(row_number() == which.max(threshold >= 2),row_number() == which.max(threshold >= 4))

I expected a dataframe like this:

      loc.id threshold
        1       2
        1       4
        2       2
        2       4

But it returns me an empty dataframe

Upvotes: 1

Views: 1079

Answers (2)

Nettle
Nettle

Reputation: 3321

If this isn't what you want, assign the df below a name and use it to filter your dataset.

df %>% 
  distinct() %>% 
  filter(threshold ==2 | threshold==4)
#>   loc.id threshold
#> 1      1         2
#> 2      1         4
#> 3      2         2
#> 4      2         4
```

Upvotes: 1

akrun
akrun

Reputation: 887981

Based on the condition, we can slice the rows from concatenating the two which.max index, get the unique (if there are only cases where threshold is greater than 4, then both the conditions get the same index)

df %>%
    group_by(loc.id) %>%
    filter(any(threshold >= 2)) %>% # additional check
    #slice(unique(c(which.max(threshold > 2), which.max(threshold > 4))))
    # based on the expected output
    slice(unique(c(which.max(threshold >= 2), which.max(threshold >= 4))))
# A tibble: 4 x 2
# Groups:   loc.id [2]
#  loc.id threshold
#   <int>     <int>
#1      1         2
#2      1         4
#3      2         2
#4      2         4

Note that there can be groups where there are no values in threshold greater than or equal to 2. We could keep only those groups

Upvotes: 2

Related Questions