Reputation: 155
I am trying to compare 1st row of a matrix with all rows of the same matrix. But the vectorized comparison is not returning correct results. Any reason why this may be happening?
m <- matrix(c(1,2,3,1,2,4), nrow=2, ncol=3, byrow=TRUE)
> m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
> # Why does the first row not have 3 TRUE values?
> m[1,] == m
[,1] [,2] [,3]
[1,] TRUE FALSE FALSE
[2,] FALSE FALSE FALSE
> m[1,] == m[1,]
[1] TRUE TRUE TRUE
> m[1,] == m[2,]
[1] TRUE TRUE FALSE
Follow-up. In my actual data I have large number of rows then (atleast 10million) then both time and memory adds up. Additional suggestions on the below as suggested below by others?
m <- matrix(rep(c(1,2,3), 1000000), ncol=3, byrow=TRUE)
> #by @alexis_laz
> m1 <- matrix(m[1,], nrow = nrow(m), ncol = ncol(m), byrow = T)
> system.time(m == m1)
user system elapsed
0.21 0.03 0.31
> object.size(m1)
24000112 bytes
> #by @PaulHiemstra
> system.time( t(apply(m, 1, function(x) x == m[1,])) )
user system elapsed
35.18 0.08 36.04
Follow-up 2. @alexis_laz you are correct. I want to compare every row with each other and have posted a followup question on that ( How to vectorize comparing each row of matrix with all other rows)
Upvotes: 2
Views: 96
Reputation: 60924
As @MatthewLundberg pointed out, the recycling rules of R do not behave as you expected. In my opinion it is always better to explicitely state what to compare and not rely on R's assumptions. One way to make the correct comparison:
t(apply(m, 1, function(x) x == m[1,]))
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] TRUE TRUE FALSE
or:
m == rbind(m[1,], m[1,])
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] TRUE TRUE FALSE
or by making R's recyling working in your favor (thanks to @Arun):
t(t(m) == m[1,])
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] TRUE TRUE FALSE
Upvotes: 2
Reputation: 42639
In the comparison m[1,] == m
, the first term m[1,]
is recycled (once) to equal the length of m. The comparison is then done column-wise.
You're comparing c(1,2,3)
with c(1,1,2,2,3,4)
, thus c(1,2,3,1,2,3)
with c(1,1,2,2,3,3,4)
so you have one TRUE
followed by five FALSE
(and packaged as a matrix to match the dimensions of m
).
Upvotes: 4