Reputation: 1145
I have a matrix like
[,1] [,2]
[1,] 1 3
[2,] 4 6
[3,] 11 12
[4,] 13 14
I want to convert this matrix to a vector like this:
# indices 1-6, 11-14 = 1, gap indices 7-10 = 0
xx <- c(1,1,1,1,1,1,0,0,0,0,1,1,1,1)
The idea: The matrix has values from 1 through 14. And the length of the vector is also 14. If you assume the first column to be the start and the second column to be the end, then for those ranges present in the matrix, i.e., 1-3, 4-6, 11-12, 13-4 (or equivalently 1-6, 11-14), I want the values at these indices to be 1 in my output vector. And the gap of 7-10 in my matrix should have a value of 0 at indices 7-10 in my output vector. (Thanks for the edit)
However, sometimes the matrix does not give the last value in the matrix. However, I always know the size of after the transformation, let say, in this case, 20. Then, the resulting vector should like this:
# indices 1-6, 11-14 = 1, gap indices 7-10 = 0, indices 15-20 = 0
xx <- c(1,1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0)
How can I do that without a loop? My matrix is quite long, I tried using loop is slow.
Upvotes: 1
Views: 249
Reputation: 89097
Throwing this one here, it uses base R and should be somewhat fast since the inevitable loop is handled by rep
:
zero.lengths <- m[,1] - c(0, head(m[,2], -1)) - 1
one.lengths <- m[,2] - m[,1] + 1
rep(rep(c(0, 1), nrow(m)),
as.vector(rbind(zero.lengths, one.lengths)))
Or another solution using sequence
:
out <- integer(m[length(m)]) # or `integer(20)` following OP's edit.
one.starts <- m[,1]
one.lengths <- m[,2] - m[,1] + 1
one.idx <- sequence(one.lengths) + rep(one.starts, one.lengths) - 1L
out[one.idx] <- 1L
Upvotes: 1
Reputation: 118879
Here's an answer using IRanges
package:
require(IRanges)
xx.ir <- IRanges(start = xx[,1], end = xx[,2])
as.vector(coverage(xx.ir))
# [1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1
If you specify a min
and max
value of your entire vector length, then:
max.val <- 20
min.val <- 1
c(rep(0, min.val-1), as.vector(coverage(xx.ir)), rep(0, max.val-max(xx)))
Upvotes: 2
Reputation: 4432
@ Arun's answer seems better.
Now that I understand the problem (or do I?). Here is a solution in base R that makes use of the idea that only contiguous sequences of zeroes need to be kept.
find.ones <- function (mat) {
ones <- rep(0, max(mat))
ones[c(mat)] <- 1
ones <- paste0(ones, collapse="")
ones <- gsub("101", "111", ones)
ones <- as.numeric(strsplit(ones, "")[[1]])
ones
}
On the OP's original example:
m <- matrix(c(1, 3, 4, 6, 11, 12, 13, 14), ncol=2, byrow=TRUE)
find.ones(m)
[1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1
To benchmark the solution, let's make a matrix big enough:
set.seed(10)
m <- sample.int(n=1e6, size=5e5)
m <- matrix(sort(m), ncol=2, byrow=TRUE)
head(m)
[,1] [,2]
[1,] 1 3
[2,] 4 5
[3,] 9 10
[4,] 11 13
[5,] 14 18
[6,] 22 23
system.time(ones <- find.ones(m))
user system elapsed
1.167 0.000 1.167
Upvotes: 1