chippycentra
chippycentra

Reputation: 879

subset a data frame using dplyr in R

using R I'm trying to filter my data frame according to some arguments.

Here is the data frame:

Groups_name  col1   col2
group1       3       4
group1       1       1
group1       1       1
group2       1       1
group3       3       7
group3       1       1
group4       3       3
group4       1       1

and by group I want to only keep groups that contain at least one row where the col1 > 1 and where col1 == col2 or col1 == col2+-2

Here I should get:

Groups_name  col1   col2
group1       3      4
group1       1      1
group1       1      1
group4       3      3
group4       1      1

as you can see I kept the group1 because the in the first row, the col1 >1 and col1 (3) = col2 +1 (4) I also keep group 3 because the col1 >1 and col1 (3) == col2 (3)

but group 1 was removed because the col1 what not > 1

And I also removed the group 3 because even if col1 (3) > 1, the col1 (3) is not equal to 7 + or - 2 (so not equal to 5,6,7,8 or 9)

From now I tried:

tab %>%
  group_by(Groups_name) %>%
  filter(all(col1 == col2,col2-2,col2+2))  %>%
  filter(any(col1 > 1))

Thank for your help.

Upvotes: 0

Views: 76

Answers (2)

akrun
akrun

Reputation: 887871

We can do this in data.table

library(data.table)
setDT(df)[, .SD[any(col1 >1) & all(abs(col1 - col2) %in% 0:2)], .(Groups_name)]
#   Groups_name col1 col2
#1:      group1    3    4
#2:      group1    1    1
#3:      group1    1    1
#4:      group4    3    3
#5:      group4    1    1

data

df <- structure(list(Groups_name = c("group1", "group1", "group1", 
"group2", "group3", "group3", "group4", "group4"), col1 = c(3L, 
1L, 1L, 1L, 3L, 1L, 3L, 1L), col2 = c(4L, 1L, 1L, 1L, 7L, 1L, 
3L, 1L)), class = "data.frame", row.names = c(NA, -8L))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389255

We could use any and all in the following way

library(dplyr)
df %>%
  group_by(Groups_name) %>%
  filter(any(col1 > 1) & all(abs(col1 - col2) %in% 0:2))

#  Groups_name  col1  col2
#  <fct>       <int> <int>
#1 group1          3     4
#2 group1          1     1
#3 group1          1     1
#4 group4          3     3
#5 group4          1     1

This selects groups where there is at least one value in col1 greater than 1 and absolute difference between col1 and col2 is always between 0 and 2.

Upvotes: 2

Related Questions