BrTarolg
BrTarolg

Reputation: 35

Choose closest x elements by index in a list/vector

If I have a vector such as x <-c(1,2,3,4,5,6,7,8,9), I want a function f such that f(vector,index,num) where it takes the vector and gives me num "closest" elements to that one on the index Examples: f(x,3,4) = c(1,2,4,5) f(x,1,5) = c(2,3,4,5,6) f(x,8,3) = c(6,7,9)

Since there is also the issue where if we have an odd num, we will need to choose whether to pick left or right side by symmetry, let's go with choosing the left side (but right side is ok too) i.e f(x,4,5) = c(1,2,3,5,6) and f(x,7,3) = c(5,6,8)

I hope my question is clear, thank you for any help/responses!

edit: The original vector of c(1:9) is arbitrary, the vector could be a vector of strings, or a vector of length 1000 with shuffled numbers with repeats etc.

i.e c(1,7,4,2,3,7,2,6,234,56,8)

Upvotes: 2

Views: 184

Answers (3)

smci
smci

Reputation: 33940

num_closest_by_indices <- function(v, idx, num) {
  # Try the base case, where idx is not within (num/2) of the edge
  i <- abs(seq_along(x) - idx)
  i[idx] <- +Inf # sentinel

  # If there are not enough elements in the base case, incrementally add more
  for (cutoff_idx in seq(floor(num/2), num)) {
    if (sum(i <= cutoff_idx) >= num) {
      # This will add two extra indices every iteration. Strictly if we have an even length, we should add the leftmost one first and `continue`, to break ties towards the left.
      return(v[i <= cutoff_idx])
    }
  }
} 

Here's an illustration of this algorithm: we rank the indices in order of desirability, then pick the lowest num legal ones:

> seq_along(x)
  1 2 3 4 5 6 7 8 9
> seq_along(x) - idx
  -2 -1  0  1  2  3  4  5  6
> i <- abs(seq_along(x) - idx)
   2  1  0  1  2  3  4  5  6
> i[idx] <- +Inf # sentinel to prevent us returning the element itself
   2   1 Inf   1   2   3   4   5   6

Now we can just find num elements with smallest values (break ties arbitrarily, unless you have a preference (left)). Our first guess is all indices <= (num/2) ; this might not be enough if index is within (num/2) of the start/end.

> i <= 2
  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
> v[i <= 2]
  1 2 4 5

So, adapting @dash2's code to handle the corner cases where some indices are illegal (nonpositive, or > length(x)), i.e. ! %in% 1:L. Then min(elems) would be the number of illegal indices which we cannot pick, hence we must pick abs(min(elems)) more.

Notes:

  • in the end the code is simpler and faster to handle it by three piecewise cases. Aww.
  • it actually seems to simplify things if we pick (num+1) indices, then remove idx before returning the answer. Using result[-idx] to remove it.

Upvotes: 2

Martin Morgan
Martin Morgan

Reputation: 46856

Start a function with the variable argument x first, and the reference table and n after

.nearest_n <- function(x, table, n) {

The algorithm assumes that table is numeric, without any duplicates, and all values finite; n has to be less than or equal to the length of the table

    ## assert & setup
    stopifnot(
        is.numeric(table), !anyDuplicated(table), all(is.finite(table)),
        n <= length(table)
    )

Sort the table and then 'clamp' maximum and minimum values

    ## sort and clamp
    table <- c(-Inf, sort(table), Inf)
    len <- length(table)

Find the interval in table where x occurs; findInterval() uses an efficient search. Use the interval index as the initial lower index, and add 1 for the upper index, making sure to stay in-bounds.

    ## where to start?
    lower <- findInterval(x, table)
    upper <- min(lower + 1L, len)

Find the nearest n neighbors by comparing the lower and upper index distance to x, record the nearest value, and increment the lower or upper index as appropriate and making sure to stay in-bounds

    ## find
    nearest <- numeric(n)
    for (i in seq_len(n)) {
        if (abs(x - table[lower]) < abs(x - table[upper])) {
            nearest[i] = table[lower]
            lower = max(1L, lower - 1L)
        } else {
            nearest[i] = table[upper]
            upper = min(len, upper + 1L)
        }
    }

Then return the solution and finish the function

    nearest
}

The code might seem verbose, but is actually relatively efficient because the only operations on the entire vector (sort(), findInterval()) are implemented efficiently in R.

A particular advantage of this approach is that it can be vectorized in it's first argument, calculating the test for using lower (use_lower = ...) as a vector and using pmin() / pmax() as clamps.

.nearest_n <- function(x, table, n) {
    ## assert & setup
    stopifnot(
        is.numeric(table), !anyDuplicated(table), all(is.finite(table)),
        n <= length(table)
    )

    ## sort and clamp
    table <- c(-Inf, sort(table), Inf)
    len <- length(table)

    ## where to start?
    lower <- findInterval(x, table)
    upper <- pmin(lower + 1L, len)

    ## find
    nearest <- matrix(0, nrow = length(x), ncol = n)
    for (i in seq_len(n)) {
        use_lower <- abs(x - table[lower]) < abs(x - table[upper])
        nearest[,i] <- ifelse(use_lower, table[lower], table[upper])
        lower[use_lower] <- pmax(1L, lower[use_lower] - 1L)
        upper[!use_lower] <- pmin(len, upper[!use_lower] + 1L)
    }

    # return
    nearest
}

For instance

> set.seed(123)
> table <- sample(100, 10)
> sort(table)
 [1]  5 29 41 42 50 51 79 83 86 91
> .nearest_n(c(30, 20), table, 4)
     [,1] [,2] [,3] [,4]
[1,]   29   41   42   50
[2,]   29    5   41   42

Generalize this by taking any argument and coercing it to the required form using a reference look-up table table0 and the indexes into it table1

nearest_n <- function(x, table, n) {
    ## coerce to common form
    table0 <- sort(unique(c(x, table)))
    x <- match(x, table0)
    table1 <- match(table, table0)

    ## find nearest
    m <- .nearest_n(x, table1, n)

    ## result in original form
    matrix(table0[m], nrow = nrow(m))
}

As an example...

> set.seed(123)
> table <- sample(c(letters, LETTERS), 30)
> nearest_n(c("M", "Z"), table, 5)
     [,1] [,2] [,3] [,4] [,5]
[1,] "o"  "L"  "O"  "l"  "P" 
[2,] "Z"  "z"  "Y"  "y"  "w" 

Upvotes: 0

user3603486
user3603486

Reputation:

Like so:

f <- function (vec, elem, n) {
  elems <- seq(elem - ceiling(n/2), elem + floor(n/2))
  if (max(elems) > length(vec)) elems <- elems - (max(elems) - length(vec))
  if (elems[1] < 1) elems <- elems + (1 - elems[1])
  elems <- setdiff(elems, elem)
  vec[elems]
}

Giving results:

> f(1:9, 1, 5)
[1] 2 3 4 5 6
> f(1:9, 9, 5)
[1] 4 5 6 7 8
> f(1:9, 2, 5)
[1] 1 3 4 5 6
> f(1:9, 4, 5)
[1] 1 2 3 5 6
> f(1:9, 4, 4)
[1] 2 3 5 6
> f(1:9, 2, 4)
[1] 1 3 4 5
> f(1:9, 1, 4)
[1] 2 3 4 5
> f(1:9, 9, 4)
[1] 5 6 7 8

Upvotes: 1

Related Questions