roger
roger

Reputation: 149

Filter groups that have a difference more than a specific value in R

I actually want to group by v1, and filter only those that the difference between min(v3) and max(v3) is bigger than 6 I have the following data frame

v1  v2  v3
a   2   13
b   5   3
c   2   1
d   2   1
e   1   2
a   2   4
a   8   1
e   1   9
b   0   1
c   2   8
d   1   5

If we calculate we will find :

a   “13-1”  =12
b   “3-1”   =2
c   “8-1”   =7
d   “5-1”   =4
e   “9-2”   =7

So the expected result would be keeping values of groups a, c and e as they are all >=6

v1  v2  v3
a   2   13
c   2   1
e   1   2
a   2   4
a   8   1
e   1   9
c   2   8

Upvotes: 0

Views: 530

Answers (2)

akrun
akrun

Reputation: 887048

We can use dplyr to group by 'v1' and get the difference of range of 'v3' to be less than 6

library(dplyr)
df1 %>%
  group_by(v1)%>%
  filter(abs(diff(range(v3))) >= 6)
# A tibble: 7 x 3
# Groups:   v1 [3]
#  v1       v2    v3
#  <chr> <int> <int>
#1 a         2    13
#2 c         2     1
#3 e         1     2
#4 a         2     4
#5 a         8     1
#6 e         1     9
#7 c         2     8

Or we can arrange the column 'v3' and then filter by the difference of first and last values

df1 %>% 
    arrange(v1, v3) %>% 
    group_by(v1) %>% 
    filter(last(v3)  - first(v3) >=6)

Or with data.table

library(data.table)
setDT(df1)[, .SD[abs(diff(range(v3))) >= 6], by = v1]

Or another option is .I

setDT(df1)[df1[, .I[abs(diff(range(v3))) >= 6], by = v1]$V1]

Or another option is ave from base R

i1 <- with(df1, ave(v3, v1, FUN = function(x) abs(diff(range(x)))) >= 6)
df1[i1,]

Or using subset and tapply

subset(df1, v1 %in% names(which(tapply(v3, v1, 
           function(x) diff(range(x))) >=6)))

data

df1 <- structure(list(v1 = c("a", "b", "c", "d", "e", "a", "a", "e", 
"b", "c", "d"), v2 = c(2L, 5L, 2L, 2L, 1L, 2L, 8L, 1L, 0L, 2L, 
1L), v3 = c(13L, 3L, 1L, 1L, 2L, 4L, 1L, 9L, 1L, 8L, 5L)), 
class = "data.frame", row.names = c(NA, 
-11L))

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388907

We can group_by v1 and use diff on range of v3 values.

library(dplyr)
df %>%  group_by(v1) %>% filter(diff(range(v3)) >= 6)

#  v1       v2    v3
#  <fct> <int> <int>
#1 a         2    13
#2 c         2     1
#3 e         1     2
#4 a         2     4
#5 a         8     1
#6 e         1     9
#7 c         2     8

OR we can also use max - min

df %>%  group_by(v1) %>%  filter(max(v3) - min(v3) >= 6)

We can use the same in base R ave

subset(df, ave(v3, v1, FUN = function(x) diff(range(x))) >= 6)

Upvotes: 2

Related Questions