user1938809
user1938809

Reputation: 1145

R: matrix to indexes

I have a matrix like

      [,1] [,2]
 [1,]    1    3
 [2,]    4    6
 [3,]   11   12
 [4,]   13   14

I want to convert this matrix to a vector like this:

# indices 1-6, 11-14 = 1, gap indices 7-10 = 0
xx <- c(1,1,1,1,1,1,0,0,0,0,1,1,1,1)

The idea: The matrix has values from 1 through 14. And the length of the vector is also 14. If you assume the first column to be the start and the second column to be the end, then for those ranges present in the matrix, i.e., 1-3, 4-6, 11-12, 13-4 (or equivalently 1-6, 11-14), I want the values at these indices to be 1 in my output vector. And the gap of 7-10 in my matrix should have a value of 0 at indices 7-10 in my output vector. (Thanks for the edit)

However, sometimes the matrix does not give the last value in the matrix. However, I always know the size of after the transformation, let say, in this case, 20. Then, the resulting vector should like this:

# indices 1-6, 11-14 = 1, gap indices 7-10 = 0, indices 15-20 = 0
xx <- c(1,1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0)

How can I do that without a loop? My matrix is quite long, I tried using loop is slow.

Upvotes: 1

Views: 248

Answers (3)

flodel
flodel

Reputation: 89057

Throwing this one here, it uses base R and should be somewhat fast since the inevitable loop is handled by rep:

zero.lengths <- m[,1] - c(0, head(m[,2], -1)) - 1
one.lengths  <- m[,2] - m[,1] + 1

rep(rep(c(0, 1), nrow(m)),
    as.vector(rbind(zero.lengths, one.lengths)))

Or another solution using sequence:

out <- integer(m[length(m)])    # or `integer(20)` following OP's edit.
one.starts  <- m[,1]
one.lengths <- m[,2] - m[,1] + 1
one.idx <- sequence(one.lengths) + rep(one.starts, one.lengths) - 1L
out[one.idx] <- 1L

Upvotes: 1

Arun
Arun

Reputation: 118779

Here's an answer using IRanges package:

require(IRanges)
xx.ir <- IRanges(start = xx[,1], end = xx[,2])
as.vector(coverage(xx.ir))
# [1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1

If you specify a min and max value of your entire vector length, then:

max.val <- 20
min.val <- 1
c(rep(0, min.val-1), as.vector(coverage(xx.ir)), rep(0, max.val-max(xx)))

Upvotes: 2

asb
asb

Reputation: 4432

@ Arun's answer seems better.

Now that I understand the problem (or do I?). Here is a solution in base R that makes use of the idea that only contiguous sequences of zeroes need to be kept.

find.ones <- function (mat) {
  ones <- rep(0, max(mat))
  ones[c(mat)] <- 1
  ones <- paste0(ones, collapse="")
  ones <- gsub("101", "111", ones)
  ones <- as.numeric(strsplit(ones, "")[[1]])
  ones
}

On the OP's original example:

m <- matrix(c(1, 3, 4, 6, 11, 12, 13, 14), ncol=2, byrow=TRUE)
find.ones(m)
[1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1

To benchmark the solution, let's make a matrix big enough:

set.seed(10)
m <- sample.int(n=1e6, size=5e5)                                              
m <- matrix(sort(m), ncol=2, byrow=TRUE)                                           

head(m)                                                           
     [,1] [,2]
[1,]    1    3
[2,]    4    5
[3,]    9   10
[4,]   11   13
[5,]   14   18
[6,]   22   23

system.time(ones <- find.ones(m))

 user  system elapsed 
1.167   0.000   1.167 

Upvotes: 1

Related Questions