Reputation: 1545
I have the following problem in R (for a Markov chain). Suppose there is a statespace matrix S with rows of unique integer vectors (states). I am given a vector s from this matrix, and want to determine the index of the row that corresponds to this vector. There are a couple of solutions:
A solution using all.equal
as in:
which(apply(S,1,function(x){ isTRUE(all.equal(s,x)) }) )
Map the vectors to a unique string and identify them with this string:
statecodes <- apply(S,1,function(x) paste(x,collapse=" ") )
check.equal <- function(s) {
z <- which(statecodes == paste(s, collapse=" "))
return(z)
}
check.equal(s)
The first (often suggested) solution is downright terrible; it already takes 2.16 seconds for a state space of 16,000 vectors with length 4. The second solution is a lot faster, taking 0-0.01 seconds for the same state space. However, when the length of the vectors increases, it becomes increasingly slow. I feel that my string method is reasonable, but there must be something better. What would be a quicker way to make such comparisons?
For completeness sake, the state space for my problem could be generated as follows. If the vector has N elements, and I denotes the maximum that each element of a vector can attain (for example, 10) it is given by:
I <- rep(10,N)
S <- as.matrix(expand.grid( lapply(1:N, function(i) { 0:I[i]}) ) )
How can the integrality of the states be exploited in order to make an as quick as possible comparison?
Upvotes: 3
Views: 298
Reputation: 37824
One simple way of getting an integer value for each state is to cast the value to an integer and then multiply each column by the right base.
My version of that is makecheck2
; the version using paste
is makecheck2
. I've also modified the paste
version to use match
so it can check multiple values at the same time. Both versions now return a function to be used to get the match.
The setup for my version is faster; 0.065 sec vs 1.552 sec.
N <- 5
I <- rep(10,N)
S <- as.matrix(expand.grid( lapply(1:N, function(i) { 0:I[i]}) ) )
system.time(f1 <- makecheck1(S))
# user system elapsed
# 1.547 0.000 1.552
system.time(f2 <- makecheck2(S))
# user system elapsed
# 0.063 0.000 0.065
Here I test with 1 to 10000 values to check. The paste
version is faster for small values; my version is faster for large values.
> set.seed(5)
> k <- lapply(0:4, function(idx) sample(1:nrow(S), 10^idx))
> s <- lapply(k, function(idx) S[idx,])
> t1 <- sapply(s, function(x) unname(system.time(for(i in 1:100) f1(x))[1]))
> t2 <- sapply(s, function(x) unname(system.time(for(i in 1:100) f2(x))[1]))
> data.frame(n=10^(0:4), time1=t1, time2=t2)
n time1 time2
1 1 0.761 1.512
2 10 0.772 1.523
3 100 0.857 1.552
4 1000 1.592 1.547
5 10000 9.651 1.848
Code for both versions follow:
makecheck2 <- function(m) {
codes <- vector("list", length=ncol(m))
top <- vector("integer", length=ncol(m)+1)
top[1L] <- 1L
for(idx in 1:ncol(m)) {
codes[[idx]] <- unique(m[,idx])
top[idx+1L] <- top[idx]*length(codes[[idx]])
}
getcode <- function(x) {
out <- 0L
for(idx in 1:length(codes)) {
out <- out + top[idx]*match(x[,idx], codes[[idx]])
}
out
}
key <- getcode(m)
f <- function(x) {
if(!is.matrix(x)) {
x <- matrix(x, ncol=length(codes))
}
match(getcode(x), key)
}
rm(m) # perhaps there's a better way to remove these from the closure???
rm(idx)
f
}
makecheck1 <- function(m) {
n <- ncol(m)
statecodes <- apply(m,1,function(x) paste(x,collapse=" ") )
rm(m)
function(x) {
if(!is.matrix(x)) {
x <- matrix(x, ncol=n)
}
x <- apply(x, 1, paste, collapse=" ")
match(x, statecodes)
}
}
Upvotes: 2
Reputation: 4643
One way to do this is which(colSums(abs(t(S)-V))==0)
where V
is a vector you are looking for.
Upvotes: 3