Ben
Ben

Reputation: 65

Select rows by column value based on range of values in another column in R

I have a dataframe similar to this:

x <- data.frame("A" = c(11:24), 
                "B" = c(25,25,25,25,25,37,37,16,16,16,16,16,42,42), 
                "C" = c(1:3,1:2,1:2,1:3,1:2,1:2))
 A  B C
11 25 1
12 25 2
13 25 3
14 25 1
15 25 2
16 37 1
17 37 2
18 16 1
19 16 2
20 16 3
21 16 1
22 16 2
23 42 1
24 42 2

I want to keep only the rows where each value in B has at least one of all values (1-3) in C. So my result would look like:

 A  B C
11 25 1
12 25 2
13 25 3
14 25 1
15 25 2
18 16 1
19 16 2
20 16 3
21 16 1
22 16 2

I can't seem to get the right keywords in my search for answers.

Upvotes: 2

Views: 881

Answers (2)

Fino
Fino

Reputation: 1784

Another option is to use data.table to count unique C's for each B and then filter your data to only contain B's that have 3 distinct C's

library(data.table)
setDT(x)

x[B %in% x[,length(unique(C)),by=B][V1==3,B]]

Upvotes: 1

akrun
akrun

Reputation: 887951

We can use all after grouping by 'B'

library(dplyr)
x %>%
    group_by(B) %>%
    filter(all(1:3 %in% C))
# A tibble: 10 x 3
# Groups:   B [2]
#       A     B     C
#   <int> <dbl> <int>
# 1    11    25     1
# 2    12    25     2
# 3    13    25     3
# 4    14    25     1
# 5    15    25     2
# 6    18    16     1
# 7    19    16     2
# 8    20    16     3
# 9    21    16     1
#10    22    16     2

Upvotes: 1

Related Questions