Reputation: 1145
I have a matrix like
[,1] [,2]
[1,] 1 3
[2,] 4 6
[3,] 11 12
[4,] 13 14
I want to convert this matrix to a vector like this:
# indices 1-6, 11-14 = 1, gap indices 7-10 = 0
xx <- c(1,1,1,1,1,1,0,0,0,0,1,1,1,1)
The idea: The matrix has values from 1 through 14. And the length of the vector is also 14. If you assume the first column to be the start and the second column to be the end, then for those ranges present in the matrix, i.e., 1-3, 4-6, 11-12, 13-4 (or equivalently 1-6, 11-14), I want the values at these indices to be 1 in my output vector. And the gap of 7-10 in my matrix should have a value of 0 at indices 7-10 in my output vector. (Thanks for the edit)
However, sometimes the matrix does not give the last value in the matrix. However, I always know the size of after the transformation, let say, in this case, 20. Then, the resulting vector should like this:
# indices 1-6, 11-14 = 1, gap indices 7-10 = 0, indices 15-20 = 0
xx <- c(1,1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0)
How can I do that without a loop? My matrix is quite long, I tried using loop is slow.
Upvotes: 1
Views: 248
Reputation: 89057
Throwing this one here, it uses base R and should be somewhat fast since the inevitable loop is handled by rep
:
zero.lengths <- m[,1] - c(0, head(m[,2], -1)) - 1
one.lengths <- m[,2] - m[,1] + 1
rep(rep(c(0, 1), nrow(m)),
as.vector(rbind(zero.lengths, one.lengths)))
Or another solution using sequence
:
out <- integer(m[length(m)]) # or `integer(20)` following OP's edit.
one.starts <- m[,1]
one.lengths <- m[,2] - m[,1] + 1
one.idx <- sequence(one.lengths) + rep(one.starts, one.lengths) - 1L
out[one.idx] <- 1L
Upvotes: 1
Reputation: 118779
Here's an answer using IRanges
package:
require(IRanges)
xx.ir <- IRanges(start = xx[,1], end = xx[,2])
as.vector(coverage(xx.ir))
# [1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1
If you specify a min
and max
value of your entire vector length, then:
max.val <- 20
min.val <- 1
c(rep(0, min.val-1), as.vector(coverage(xx.ir)), rep(0, max.val-max(xx)))
Upvotes: 2
Reputation: 4432
@ Arun's answer seems better.
Now that I understand the problem (or do I?). Here is a solution in base R that makes use of the idea that only contiguous sequences of zeroes need to be kept.
find.ones <- function (mat) {
ones <- rep(0, max(mat))
ones[c(mat)] <- 1
ones <- paste0(ones, collapse="")
ones <- gsub("101", "111", ones)
ones <- as.numeric(strsplit(ones, "")[[1]])
ones
}
On the OP's original example:
m <- matrix(c(1, 3, 4, 6, 11, 12, 13, 14), ncol=2, byrow=TRUE)
find.ones(m)
[1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1
To benchmark the solution, let's make a matrix big enough:
set.seed(10)
m <- sample.int(n=1e6, size=5e5)
m <- matrix(sort(m), ncol=2, byrow=TRUE)
head(m)
[,1] [,2]
[1,] 1 3
[2,] 4 5
[3,] 9 10
[4,] 11 13
[5,] 14 18
[6,] 22 23
system.time(ones <- find.ones(m))
user system elapsed
1.167 0.000 1.167
Upvotes: 1