Quickest way of finding the index of an integer vector in a matrix in R

Question

I have the following problem in R (for a Markov chain). Suppose there is a statespace matrix S with rows of unique integer vectors (states). I am given a vector s from this matrix, and want to determine the index of the row that corresponds to this vector. There are a couple of solutions:

A solution using all.equal as in:

which(apply(S,1,function(x){ isTRUE(all.equal(s,x)) }) )

Map the vectors to a unique string and identify them with this string:

statecodes <- apply(S,1,function(x) paste(x,collapse=" ") ) 
check.equal <- function(s) {
    z <- which(statecodes == paste(s, collapse=" "))
    return(z)
}
check.equal(s)

The first (often suggested) solution is downright terrible; it already takes 2.16 seconds for a state space of 16,000 vectors with length 4. The second solution is a lot faster, taking 0-0.01 seconds for the same state space. However, when the length of the vectors increases, it becomes increasingly slow. I feel that my string method is reasonable, but there must be something better. What would be a quicker way to make such comparisons?

For completeness sake, the state space for my problem could be generated as follows. If the vector has N elements, and I denotes the maximum that each element of a vector can attain (for example, 10) it is given by:

I <- rep(10,N)
S <- as.matrix(expand.grid( lapply(1:N, function(i) { 0:I[i]}) ) )

How can the integrality of the states be exploited in order to make an as quick as possible comparison?

Aaron - mostly inactive · Accepted Answer

One simple way of getting an integer value for each state is to cast the value to an integer and then multiply each column by the right base.

My version of that is makecheck2; the version using paste is makecheck2. I've also modified the paste version to use match so it can check multiple values at the same time. Both versions now return a function to be used to get the match.

The setup for my version is faster; 0.065 sec vs 1.552 sec.

N <- 5
I <- rep(10,N)
S <- as.matrix(expand.grid( lapply(1:N, function(i) { 0:I[i]}) ) )
system.time(f1 <- makecheck1(S))
#   user  system elapsed 
#  1.547   0.000   1.552 
system.time(f2 <- makecheck2(S))
#   user  system elapsed 
#  0.063   0.000   0.065

Here I test with 1 to 10000 values to check. The paste version is faster for small values; my version is faster for large values.

> set.seed(5)
> k <- lapply(0:4, function(idx) sample(1:nrow(S), 10^idx))
> s <- lapply(k, function(idx) S[idx,])
> t1 <- sapply(s, function(x) unname(system.time(for(i in 1:100) f1(x))[1]))
> t2 <- sapply(s, function(x) unname(system.time(for(i in 1:100) f2(x))[1]))
> data.frame(n=10^(0:4), time1=t1, time2=t2)
      n time1 time2
1     1 0.761 1.512
2    10 0.772 1.523
3   100 0.857 1.552
4  1000 1.592 1.547
5 10000 9.651 1.848

Code for both versions follow:

makecheck2 <- function(m) {
  codes <- vector("list", length=ncol(m))
  top <- vector("integer", length=ncol(m)+1)
  top[1L] <- 1L
  for(idx in 1:ncol(m)) {
    codes[[idx]] <- unique(m[,idx])
    top[idx+1L] <- top[idx]*length(codes[[idx]])
  }
  getcode <- function(x) {
    out <- 0L
    for(idx in 1:length(codes)) {
      out <- out + top[idx]*match(x[,idx], codes[[idx]])
    }
    out
  }
  key <- getcode(m)
  f <- function(x) {
    if(!is.matrix(x)) {
      x <- matrix(x, ncol=length(codes))
    }
    match(getcode(x), key)
  }
  rm(m) # perhaps there's a better way to remove these from the closure???
  rm(idx)
  f
}

makecheck1 <- function(m) {
  n <- ncol(m)
  statecodes <- apply(m,1,function(x) paste(x,collapse=" ") )
  rm(m)
  function(x) {
    if(!is.matrix(x)) {
      x <- matrix(x, ncol=n)
    }
    x <- apply(x, 1, paste, collapse=" ")
    match(x, statecodes)
  }
}

Quickest way of finding the index of an integer vector in a matrix in R

Answers (2)

Related Questions