YevgenyM
YevgenyM

Reputation: 91

R efficient way to use values as indexes

I have 10M rows matrix with integer values

A row in this matrix can look as follows:

1 1 1 1 2

I need to transform the row above to the following vector:

4 1 0 0 0 0 0 0 0

Other example:

1 2 3 4 5

To:

1 1 1 1 1 0 0 0 0

How to do it efficiently in R ?

Update: There is a function that does exactly what I need: base::tabulate (suggested here before) but it is extremely slow (took at least 15 mins to go over my init matrix)

Upvotes: 3

Views: 68

Answers (1)

flodel
flodel

Reputation: 89097

I would try something like this:

m <- nrow(x)
n <- ncol(x)
i.idx <- seq_len(m)
j.idx <- seq_len(n)

out <- matrix(0L, m, max(x))

for (j in j.idx) {
   ij <- cbind(i.idx, x[, j])
   out[ij] <- out[ij] + 1L
} 

A for loop might sound surprising for a question that asks for an efficient implementation. However, this solution is vectorized for a given column and only loops through five columns. This will be many, many times faster than looping over 10 million rows using apply.

Testing with:

n <- 1e7
m <- 5
x <- matrix(sample(1:9, n*m, T), n ,m)

this approach takes less than six seconds while a naive t(apply(x, 1, tabulate, 9)) takes close to two minutes.

Upvotes: 2

Related Questions