star
star

Reputation: 775

How to find the difference between a value and its closest value in a vector in R?

I have a vector like below:

x= c(1,23,4,15,8,17,21)

after sort values in vector we have:

c(1,4,8,15,17,21,23)

my required output is :

c(3, 3, 4, 2, 2, 2, 2) 

Which contains the difference between the value and its closest value.

But if I want to have output without sorting, is there any solution? I need an out put like c(3,2,3,2,4,2,2) to know which sample has biggest value in output table (here 5th value is the result)

Upvotes: 6

Views: 392

Answers (5)

Pierre L
Pierre L

Reputation: 28441

Nice solutions. Julius' seems to be the fastest:

library(microbenchmark)
set.seed(1262016)
x <- sample(1e5)

all.equal(heroka, NicE, julius, Ambler)
[1] TRUE

microbenchmark(

  julius = {d <- diff(sort(x))
  pmin(c(d, NA), c(NA, d), na.rm = TRUE)},

  NicE = {x <- sort(x)
  pmin(abs(x-lag(x)),abs(x-lead(x)),na.rm=T)},

  Heroka = {x= sort(x)
  diffs <- cbind(c(NA,diff(x)),c(diff(x),NA))
  apply(diffs,MARGIN=1, min, na.rm=T)},

  Ambler = {n <- length(x)
  ds <- c(
    x[2] - x[1], 
    sapply(
      2:(n - 1), 
      function(i) min(x[i] - x[i - 1], x[i + 1] - x[i])
    ),
    x[n] - x[n - 1]
  )}
)
# Unit: milliseconds
#   expr        min         lq      mean     median        uq       max neval
# julius   4.167302   5.066164  13.94478   7.967066  10.11920  89.06298   100
# NicE     4.678274   6.804918  13.85149   9.297575  12.45606  83.41032   100
# Heroka 142.107887 176.768431 199.96590 196.269671 221.05851 299.30336   100
# Ambler 268.724129 309.238792 334.66432 329.252146 359.88103 409.38698   100

Upvotes: 7

NicE
NicE

Reputation: 21425

You could try:

library(dplyr)
x <- sort(x)
pmin(abs(x-lag(x)),abs(x-lead(x)),na.rm=T)
#[1] 3 3 4 2 2 2 2

x-lag(x) calculates the difference with the closest smaller number, x-lead(x) the difference with the closest bigger number.

Upvotes: 5

Heroka
Heroka

Reputation: 13139

If I understand you correctly, you want to calculate the smallest value between a member of a vector and it's neighbours.

First, we sort the data.

x= sort(c(1,23,4,15,8,17,21))

Then, we calculate the difference with the left neighbour (which is missing for item 1) and the difference with the right neighbour (which is missing for item 2)

diffs <- cbind(c(NA,diff(x)),c(diff(x),NA))

So, now we have the difference to the left and right for each item, now all that's left is to find the smallest:

res <- apply(diffs,MARGIN=1, min, na.rm=T)

Note that while this solution contains an explanation, other provided solutions (notably the pmin-approach by @Julius) are probably faster when performance is an issue.

Upvotes: 7

Julius Vainora
Julius Vainora

Reputation: 48201

d <- diff(sort(x))
pmin(c(d, NA), c(NA, d), na.rm = TRUE)
# [1] 3 3 4 2 2 2 2

Upvotes: 13

Richard Ambler
Richard Ambler

Reputation: 5030

You could just do it by brute force:

x <- c(1, 4, 8, 15, 17, 21, 23)

n <- length(x)
ds <- c(
  x[2] - x[1], 
  sapply(
    2:(n - 1), 
    function(i) min(x[i] - x[i - 1], x[i + 1] - x[i])
  ),
  x[n] - x[n - 1]
)

Upvotes: 1

Related Questions