user3067923
user3067923

Reputation: 457

Find closest value with condition

I have a function that finds me the nearest values for each row in a matrix. It then reports a list with an index of the nearest rows. However, I want it to exclude values if they are +1 in the first AND +1 in the second column away from a particular set of values (-1 in the first and -1 in the second column should also be removed). Moreover, +1 in first column and -1 in second column with respect to the values of interest should also be avoided.

As an example, if I want things closes to c(2, 1), it should accept c(3,1) or (2,2) or (1,1), but NOT c(3,2) and not c(1,0).

Basically, for an output to be reported either column 1 or column 2 should be a value of 1 away from a row of interest, but not both.

input looks like this

x
    v1 v2
[1,] 3 1
[2,] 2 1
[3,] 3 2
[4,] 1 2
[5,] 8 5

myfunc(x)

The output looks like this. Notice that the closest thing to row 2 ($V2 in output) is row 1,3,4. The answer should only be 1 though.

$V1
[1] 2 3

$V2
[1] 1 3 4

$V3
[1] 1 2

$V4
[1] 2

$V5
integer(0)

Here is myfunc

myfunc = function(t){
    d1 <- dist(t[,1]) 
    d2 <- dist(t[,2]) 
    dF <- as.matrix(d1) <= 1 & as.matrix(d2) <= 1
    diag(dF) <- NA
    colnames(dF) <- NULL
    dF2 <- lapply(as.data.frame(dF), which)
    return(dF2)
    }

Upvotes: 0

Views: 1179

Answers (1)

Stibu
Stibu

Reputation: 15947

Basically, the rows that you want to find should differ from your reference element by +1 or -1 in one column and be identical in the other column. That means that the sum over the absolute values of the differences is exactly one. For your example c(2, 1), this works as follows:

  • c(3, 1): difference is c(1, 0), thus sum(abs(c(1, 0))) = 1 + 0 = 1
  • c(1, 1): difference is c(-1, 0), thus sum(abs(c(-1, 0))) = 1 + 0 = 1
  • etc.

The following function checks exactly this:

myfunc <- function(x) {
  do_row <- function(r) {
    r_mat <- matrix(rep(r, length = length(x)), ncol = ncol(x), byrow = TRUE)
    abs_dist <- abs(r_mat - x)
    return(which(rowSums(abs_dist) == 1))
  }
  return(apply(x, 1, do_row))
}

do_row() does the job for a single row, and then apply() is used to do this with each row. For your example, I get:

myfunc(x)
## [[1]]
## [1] 2 3
## 
## [[2]]
## [1] 1
## 
## [[3]]
## [1] 1
## 
## [[4]]
## integer(0)
## 
## [[5]]
## integer(0)

Using sweep(), one can write a shorter function:

myfunc2 <- function(x) {
  apply(x, 1, function(r) which(rowSums(abs(sweep(x, 2, r))) == 1))
}

But this seems harder to understand and it turns out that it is slower by about a factor two for your matrix x. (I have also tried it with a large matrix, and there, the efficiency seems about the same.)

Upvotes: 2

Related Questions