Philippe Massicotte
Philippe Massicotte

Reputation: 1529

Sub-setting by group closest to defined value

I have a dataframe where I would like to select within each group the lines where y is the closest to a specific value (ex.: 5).

set.seed(1234)
df <- data.frame(x = c(rep("A", 4),
                       rep("B", 4)),
                 y = c(rep(4, 2), rep(1, 2), rep(6, 2), rep(3, 2)),
                 z = rnorm(8))

df

##   x y          z
## 1 A 4 -1.2070657
## 2 A 4  0.2774292
## 3 A 1  1.0844412
## 4 A 1 -2.3456977
## 5 B 6  0.4291247
## 6 B 6  0.5060559
## 7 B 3 -0.5747400
## 8 B 3 -0.5466319

The result would be:

##   x y          z
## 1 A 4 -1.2070657
## 2 A 4  0.2774292
## 3 B 6  0.4291247
## 4 B 6  0.5060559

Thank you, Philippe

Upvotes: 3

Views: 48

Answers (4)

cpander
cpander

Reputation: 374

val <- 5
delta <- abs(val - df$y)
df <- df[delta == min(delta), ]

Upvotes: 0

akrun
akrun

Reputation: 887048

Here is an option with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'x', we create get the absolute difference of 'y' with 5, check for elements that are min from the difference, get the row index (.I), extract the column that is row index ("V1") and subset the dataset.

library(data.table)
setDT(df)[df[, {v1 <- abs(y-5)
               .I[v1==min(v1)]}, x]$V1]
#   x y          z
#1: A 4 -1.2070657
#2: A 4  0.2774292
#3: B 6  0.4291247
#4: B 6  0.5060559

Upvotes: 1

DatamineR
DatamineR

Reputation: 9618

Alternatively using base R:

 df[do.call(c, tapply(df$y, df$x, function(x) x-5 == max(x - 5))),]
  x y          z
1 A 4 -1.2070657
2 A 4  0.2774292
5 B 6  0.4291247
6 B 6  0.5060559

Upvotes: 3

Thierry
Thierry

Reputation: 18487

df %>%
  group_by(x) %>%
  mutate(
    delta = abs(y - 5)
  ) %>%
  filter(delta == min(delta)) %>%
  select(-delta)

Upvotes: 4

Related Questions