Reputation: 1332

Closest value to a specific column in R

I would like to find the closest value to column x3 below.

data=data.frame(x1=c(24,12,76),x2=c(15,30,20),x3=c(45,27,15))
data
  x1 x2 x3
1 24 15 45
2 12 30 27
3 76 20 15

So desired output will be

Closest_Value_to_x3
   24
   30
   20

Please help. Thank you

Upvotes: 14

Answers (4)

markus

Reputation: 26343

Use max.col(-abs(data[, 3] - data[, -3])) to find the column positions of the closest values and use this result as part of a matrix to extract desired values from your data. The matrix is returned by cbind

col <- 3
data[, -col][cbind(1:nrow(data),
                   max.col(-abs(data[, col] - data[, -col])))]
#[1] 24 30 20

Upvotes: 12

tmfmnk

Reputation: 39858

A tidyverse solution:

data %>%
  rowid_to_column() %>%
  gather(var, val, -c(x3, rowid)) %>%
  mutate(temp = x3 - val) %>%
  group_by(rowid) %>%
  filter(abs(temp) == min(abs(temp))) %>%
  ungroup() %>%
  select(val)

    val
  <dbl>
1    24
2    30
3    20

First, it adds a row ID. Second, it transforms the data from wide to long. Third, it calculates the difference between "x3" and the other variables. Finally, it groups by the row ID and keeps the rows where the absolute difference is the smallest.

Or:

data %>%
  rowid_to_column() %>%
  gather(var, val, -c(x3, rowid)) %>%
  mutate(temp = x3 - val) %>%
  group_by(rowid) %>%
  filter(abs(temp) == min(abs(temp))) %>%
  ungroup() %>%
  pull(val)

[1] 24 30 20

Or using an approach originally proposed by @markus (it assumes that your columns are named "x"):

data %>%
 mutate(temp = paste0("x", max.col(-abs(.[, -3] - .[, 3])))) %>%
 rowwise() %>%
 summarise(val = eval(as.symbol(temp)))

    val
  <dbl>
1   24.
2   30.
3   20.

First, it is assessing the column index of the variable where the absolute difference in regard to "x3" is the smallest and combines it with "x". Then, it evaluates the combination of x and column index as a variable and returns the appropriate value.

Also borrowing the idea from @markus (not assuming that your columns are named "x"):

data %>%
 mutate(temp = max.col(-abs(.[, -3] - .[, 3]))) %>%
 rowwise %>%
 mutate(temp = names(.)[[temp]]) %>%
 summarise(val = eval(as.symbol(temp)))

First, it is assessing the column index of the variable where the absolute difference in regard to "x3" is the smallest. Second, it returns the column name based on the column index. Finally, it evaluates it as a variable and returns the appropriate value.

Or a variant where you can reference the "x3" variable by its name and not by column index (the basic idea still from @markus):

data %>%
 mutate(temp = max.col(-abs(.[, !grepl("x3", colnames(.))] - .[, grepl("x3", colnames(.))]))) %>% 
 rowwise %>%
 mutate(temp = names(.)[[temp]]) %>%
 summarise(val = eval(as.symbol(temp)))

Upvotes: 4

grand_chat

Reputation: 531

Define a function closest_to_3 that operates on a vector and returns the value in the vector that's closest to the third member:

closest_to_3 <- function(v) v[-3][which.min(abs( v[-3]-v[3] ))]

(The idiom v[-3] deletes the 3rd member from v.) Then apply this function to each row of your data frame:

apply(data, 1, closest_to_3)
#[1] 24 30 20

Upvotes: 2

niko

Reputation: 5281

Here is another approach using matrixStats

x <- as.matrix(data[,-3L])
y <- abs(x - .subset2(data, 3L))
x[matrixStats::rowMins(y) == y]
# [1] 24 30 20

Or in base using vapply

x <- as.matrix(data[,-3L])
y <- abs(x - .subset2(data, 3L))
vapply(1:nrow(data), 
       function(k) x[k,][which.min(y[k,])], 
       numeric(1))
# [1] 24 30 20

Upvotes: 3

Closest value to a specific column in R

Answers (4)

Related Questions