Reputation: 91
I have 10M rows matrix
with integer values
A row in this matrix
can look as follows:
1 1 1 1 2
I need to transform the row above to the following vector:
4 1 0 0 0 0 0 0 0
Other example:
1 2 3 4 5
To:
1 1 1 1 1 0 0 0 0
How to do it efficiently in R
?
Update:
There is a function that does exactly what I need: base::tabulate
(suggested here before)
but it is extremely slow (took at least 15 mins to go over my init matrix)
Upvotes: 3
Views: 68
Reputation: 89097
I would try something like this:
m <- nrow(x)
n <- ncol(x)
i.idx <- seq_len(m)
j.idx <- seq_len(n)
out <- matrix(0L, m, max(x))
for (j in j.idx) {
ij <- cbind(i.idx, x[, j])
out[ij] <- out[ij] + 1L
}
A for
loop might sound surprising for a question that asks for an efficient implementation. However, this solution is vectorized for a given column and only loops through five columns. This will be many, many times faster than looping over 10 million rows using apply
.
Testing with:
n <- 1e7
m <- 5
x <- matrix(sample(1:9, n*m, T), n ,m)
this approach takes less than six seconds while a naive t(apply(x, 1, tabulate, 9))
takes close to two minutes.
Upvotes: 2