user3147662
user3147662

Reputation: 155

Why does this vectorized matrix comparison fail?

I am trying to compare 1st row of a matrix with all rows of the same matrix. But the vectorized comparison is not returning correct results. Any reason why this may be happening?

m <- matrix(c(1,2,3,1,2,4), nrow=2, ncol=3, byrow=TRUE)

> m
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    2    4

> # Why does the first row not have 3 TRUE values?
> m[1,] == m
      [,1]  [,2]  [,3]
[1,]  TRUE FALSE FALSE
[2,] FALSE FALSE FALSE

> m[1,] == m[1,]
[1] TRUE TRUE TRUE

> m[1,] == m[2,]
[1]  TRUE  TRUE FALSE

Follow-up. In my actual data I have large number of rows then (atleast 10million) then both time and memory adds up. Additional suggestions on the below as suggested below by others?

m <- matrix(rep(c(1,2,3), 1000000), ncol=3, byrow=TRUE)

> #by @alexis_laz
> m1 <- matrix(m[1,], nrow = nrow(m), ncol = ncol(m), byrow = T)
> system.time(m == m1)
   user  system elapsed 
   0.21    0.03    0.31

> object.size(m1)
24000112 bytes

> #by @PaulHiemstra
> system.time( t(apply(m, 1, function(x) x == m[1,])) )
   user  system elapsed 
  35.18    0.08   36.04 

Follow-up 2. @alexis_laz you are correct. I want to compare every row with each other and have posted a followup question on that ( How to vectorize comparing each row of matrix with all other rows)

Upvotes: 2

Views: 96

Answers (2)

Paul Hiemstra
Paul Hiemstra

Reputation: 60924

As @MatthewLundberg pointed out, the recycling rules of R do not behave as you expected. In my opinion it is always better to explicitely state what to compare and not rely on R's assumptions. One way to make the correct comparison:

t(apply(m, 1, function(x) x == m[1,]))
     [,1] [,2]  [,3]
[1,] TRUE TRUE  TRUE
[2,] TRUE TRUE FALSE

or:

m == rbind(m[1,], m[1,])
     [,1] [,2]  [,3]
[1,] TRUE TRUE  TRUE
[2,] TRUE TRUE FALSE

or by making R's recyling working in your favor (thanks to @Arun):

t(t(m) == m[1,])
     [,1] [,2]  [,3]
[1,] TRUE TRUE  TRUE
[2,] TRUE TRUE FALSE

Upvotes: 2

Matthew Lundberg
Matthew Lundberg

Reputation: 42639

In the comparison m[1,] == m, the first term m[1,] is recycled (once) to equal the length of m. The comparison is then done column-wise.

You're comparing c(1,2,3) with c(1,1,2,2,3,4), thus c(1,2,3,1,2,3) with c(1,1,2,2,3,3,4) so you have one TRUE followed by five FALSE (and packaged as a matrix to match the dimensions of m).

Upvotes: 4

Related Questions