Counting number of occurences in matrix

Question

I need to count the number of values occurrences in the entire matrix.

My matrix consists only of "0" "1" "2" and I need my result to be the sum of occurrences of each of the values

If this is my matrix:

The result should be:

0 -> 10 1 -> 9 2-> 11

I am looking for an answer that uses apply. I feel like I've searched half of the internet but I didn't stumble across an answer that would be understandable or recreatable.

I ultimately want to make this apply work parallel but I think I know how to do this.

I should mention that my matrix is 30e6x18 and table is not working because of the memory issue I guess

Mikael Jagan · Accepted Answer

If your matrix is large, then it makes sense to compute counts rowwise or columnwise to conserve memory. apply is a valid way to go about this.

Conceptually, this answer is not unlike the one I provided here for data frames. I will once again recommend that you use tabulate instead of table; it is really much more efficient.

set.seed(1L)
m <- 5L
n <- 4L
A <- matrix(sample(c("0", "1", "2"), size = m * n, replace = TRUE), m, n)
A

     [,1] [,2] [,3] [,4]
[1,] "0"  "2"  "2"  "1" 
[2,] "2"  "2"  "0"  "1" 
[3,] "0"  "1"  "0"  "1" 
[4,] "1"  "1"  "0"  "2" 
[5,] "0"  "2"  "1"  "0"

f <- function(x, levels) tabulate(factor(x, levels), length(levels))

rowSums(apply(A, 1L, f, c("0", "1", "2"))) # if 'm' has more columns than rows
## [1] 7 7 6

rowSums(apply(A, 2L, f, c("0", "1", "2"))) # if 'm' has more rows than columns
## [1] 7 7 6

You are going to want apply to loop over the smaller dimension of your matrix, so choose the second argument accordingly. If your matrix actually has millions of rows and only 18 columns, then use the second statement above, not the first.

Here is a test using a matrix with your dimensions. It only takes ~10 seconds on my machine, so parallelization might be overkill.

set.seed(1L)
m <- 3e+07L
n <- 18L
A <- matrix(sample(c("0", "1", "2"), m * n, replace = TRUE), m, n)

system.time(rowSums(apply(A, 2L, f, c("0", "1", "2"))))
##    user  system elapsed 
##   8.195   2.816  12.322

Just for fun:

library("parallel")
system.time(Reduce(`+`, mclapply(seq_len(n), function(i) f(A[, i], c("0", "1", "2")), mc.cores = 4L)))
##    user  system elapsed 
##   3.924   0.904   3.497

Counting number of occurences in matrix

Answers (2)

data

Related Questions