Conditional matrix adjacency calculation

Question

I have a matrix, e.g.:

set.seed(1)
m = matrix(rep(NA,100), nrow=10)
m[sample(1:100,10)] = 1
m
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [2,]   NA   NA   NA   NA   NA   NA    1   NA   NA    NA
 [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [6,]    1   NA   NA   NA   NA   NA   NA   NA    1    NA
 [7,]   NA   NA    1    1   NA    1   NA   NA   NA     1
 [8,]   NA   NA   NA   NA   NA    1   NA   NA   NA    NA
 [9,]   NA   NA   NA   NA   NA   NA   NA   NA    1    NA
[10,]   NA    1   NA   NA   NA   NA   NA   NA   NA    NA

I want to convert all NA values that are next (adjacent) to a non-NA value to zero. Is there any swishy matrix way to achieve this, without some hideous row-wise and col-wise looping algorithm?

NB. I've re-worked the example to be less ambiguous. I need all NA values above, below, left, and right of a non-NA value to become zero!

bgoldst · Accepted Answer

m[is.na(m) & !(cbind(is.na(m[,-1L]),T) & cbind(T,is.na(m[,-ncol(m)])) & rbind(is.na(m[-1L,]),T) & rbind(T,is.na(m[-nrow(m),])))] <- 0;
m;
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]   NA   NA   NA   NA   NA   NA    0   NA   NA    NA
##  [2,]   NA   NA   NA   NA   NA    0    1    0   NA    NA
##  [3,]   NA   NA   NA   NA   NA   NA    0   NA   NA    NA
##  [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [5,]    0   NA   NA   NA   NA   NA   NA   NA    0    NA
##  [6,]    1    0    0    0   NA    0   NA    0    1     0
##  [7,]    0    0    1    1    0    1    0   NA    0     1
##  [8,]   NA   NA    0    0    0    1    0   NA    0     0
##  [9,]   NA    0   NA   NA   NA    0   NA    0    1     0
## [10,]    0    1    0   NA   NA   NA   NA   NA    0    NA

The solution works as follows.

We construct a logical index matrix with TRUE where an element is NA AND is adjacent to (above, below, left, or right of) at least one non-NA element. We can then subscript m with the logical index matrix and assign the desired replacement value.

The LHS of the logical conjunction is easy; it is simply is.na(m).

The RHS of the logical conjunction is the trickiest part. We need to perform 4 tests, one for each direction of adjacency. The general algorithm is:

1: Index off the singular index of the adjacency direction that is not adjacent to any other index with respect to that adjacency direction. For example, for the "right direction", we index off the leftmost column, because it is not to the right of any other index. In other words, there is no column that has the leftmost column on its right, so we can ignore it (and must remove it) for the "right direction" computation.

2: Test the submatrix for NAs using is.na().

3: We then must bind (cbind() for horizontal adjacency directions, rbind() for vertical) TRUE on the opposite side (that is, opposite of the index that was removed in step 1) of the resulting logical submatrix. This effectively causes the last index in the adjacency direction to always have a (pseudo-)NA in its adjacency direction, so it will never be replaced due to that adjacency direction.

4: Logical AND the 4 tests. The result will be a logical matrix with TRUE for elements that have an NA in every adjacent cell.

5: Negate the result of step 4. This will produce a logical matrix with TRUE for elements that have at least one non-NA in any adjacent cell.

Note that there's another way to do this, which is perhaps slightly more intuitive. We can write each of the 4 tests to test for non-NA, as opposed to NA, and then logical OR them together. This would also require binding FALSE instead of TRUE for the last index. It would look like this:

m[is.na(m) & (cbind(!is.na(m[,-1L]),F) | cbind(F,!is.na(m[,-ncol(m)])) | rbind(!is.na(m[-1L,]),F) | rbind(F,!is.na(m[-nrow(m),])))] <- 0;
m;
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]   NA   NA   NA   NA   NA   NA    0   NA   NA    NA
##  [2,]   NA   NA   NA   NA   NA    0    1    0   NA    NA
##  [3,]   NA   NA   NA   NA   NA   NA    0   NA   NA    NA
##  [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [5,]    0   NA   NA   NA   NA   NA   NA   NA    0    NA
##  [6,]    1    0    0    0   NA    0   NA    0    1     0
##  [7,]    0    0    1    1    0    1    0   NA    0     1
##  [8,]   NA   NA    0    0    0    1    0   NA    0     0
##  [9,]   NA    0   NA   NA   NA    0   NA    0    1     0
## [10,]    0    1    0   NA   NA   NA   NA   NA    0    NA

The first approach is preferable because it requires only a single negation, whereas the second approach requires 4 negations.

Benchmarking

library(raster);
library(microbenchmark);

bgoldst1 <- function(m) { m[is.na(m) & !(cbind(is.na(m[,-1L]),T) & cbind(T,is.na(m[,-ncol(m)])) & rbind(is.na(m[-1L,]),T) & rbind(T,is.na(m[-nrow(m),])))] <- 0; m; };
bgoldst2 <- function(m) { m[is.na(m) & (cbind(!is.na(m[,-1L]),F) | cbind(F,!is.na(m[,-ncol(m)])) | rbind(!is.na(m[-1L,]),F) | rbind(F,!is.na(m[-nrow(m),])))] <- 0; m; };
geotheory <- function(m) { r <- raster(m,crs='+init=epsg:27700'); extent(r) <- extent(1,ncol(m),1,nrow(m)); b <- as.matrix(buffer(r,1)); m[is.na(m) & !is.na(b)] <- 0; m; };

set.seed(1L); m <- matrix(rep(NA,100),nrow=10L); m[sample(1:100,10L)] <- 1;

expected <- bgoldst1(m);
identical(expected,bgoldst2(m));
## [1] TRUE
identical(expected,geotheory(m));
## [1] TRUE

microbenchmark(bgoldst1(m),bgoldst2(m),geotheory(m));
## Unit: microseconds
##          expr      min        lq      mean    median        uq      max neval
##   bgoldst1(m)   89.380   96.0085  110.0142  107.9825  119.1015  197.149   100
##   bgoldst2(m)   87.242   97.5055  111.4725  107.3410  121.2410  176.194   100
##  geotheory(m) 5010.376 5519.7095 6017.3685 5824.4115 6289.9115 9013.201   100

set.seed(1L); NR <- 100L; NC <- 100L; probNA <- 0.9; m <- matrix(sample(c(1,NA),NR*NC,T,c(1-probNA,probNA)),NR);

expected <- bgoldst1(m);
identical(expected,bgoldst2(m));
## [1] TRUE
identical(expected,geotheory(m));
## [1] TRUE

microbenchmark(bgoldst1(m),bgoldst2(m),geotheory(m));
## Unit: milliseconds
##          expr       min        lq      mean    median        uq        max neval
##   bgoldst1(m)  6.815069  7.053484  7.265562  7.100954  7.220269   8.930236   100
##   bgoldst2(m)  6.920270  7.071018  7.381712  7.127683  7.217275  16.034825   100
##  geotheory(m) 56.505277 57.989872 66.803291 58.494288 59.451588 571.142534   100

Conditional matrix adjacency calculation

Answers (2)

Benchmarking

Related Questions