Reputation: 1641
I have a binary vector that holds information on whether or not some event happened for some observation:
v <- c(0,1,1,0)
What I want to achieve is a matrix that holds information on all bivariate pairs of observations in this vector. That is, if two observations both have 0 or both have 1 in this vector v, they should get a 1 in the matrix. If one has 0 and the other has 1, they should get a 0 otherwise.
Hence, the goal is this matrix:
[,1] [,2] [,3] [,4]
[1,] 0 0 0 1
[2,] 0 0 1 0
[3,] 0 1 0 0
[4,] 1 0 0 0
Whether the main diagonal is 0 or 1 does not matter for me.
Is there an efficient and simple way to achieve this that does not require a combination of if
statements and for
loops? v
might be of considerable size.
Thanks!
Upvotes: 4
Views: 231
Reputation: 73265
If you allow the main diagonal to be 1, then there will always be two unique rows v
and 1 - v
in this matrix no matter how large v
is. Since the matrix is symmetric, it also has two such unique columns. This makes it trivial to construct this matrix.
## example `v`
set.seed(0)
v <- sample.int(2, 10, replace = TRUE) - 1L
#[1] 1 0 0 1 1 0 1 1 1 1
## column expansion from unique columns
cbind(v, 1 - v, deparse.level = 0L)[, 2 - v]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 0 0 1 1 0 1 1 1 1
# [2,] 0 1 1 0 0 1 0 0 0 0
# [3,] 0 1 1 0 0 1 0 0 0 0
# [4,] 1 0 0 1 1 0 1 1 1 1
# [5,] 1 0 0 1 1 0 1 1 1 1
# [6,] 0 1 1 0 0 1 0 0 0 0
# [7,] 1 0 0 1 1 0 1 1 1 1
# [8,] 1 0 0 1 1 0 1 1 1 1
# [9,] 1 0 0 1 1 0 1 1 1 1
#[10,] 1 0 0 1 1 0 1 1 1 1
What is the purpose of this matrix?
If there are n0
zeros and n1
ones, the matrix will have dimension (n0 + n1) x (n0 + n1)
, but there are only (n0 x n0 + n1 x n1)
ones in the matrix. So for long vector v
, the matrix is sparse. In fact, it has super sparsity, as it has large number of duplicated rows / columns.
Obviously, if you want to store the position of 1 in this matrix, you can simply get it without forming this matrix at all.
Upvotes: 2
Reputation: 6685
Another (slightly less efficient) approach than the use of outer
would be sapply
:
out <- sapply(v, function(x){
x == v
})
diag(out) <- 0L
out
[,1] [,2] [,3] [,4]
[1,] 0 0 0 1
[2,] 0 0 1 0
[3,] 0 1 0 0
[4,] 1 0 0 0
microbenchmark
on a vector of length 1000:
> test <- microbenchmark("LAP" = sapply(v, function(x){
+ x == v
+ }),
+ "markus" = outer(v, v, `==`), times = 1000, unit = "ms")
> test
Unit: milliseconds
expr min lq mean median uq max neval
LAP 3.973111 4.065555 5.747905 4.573002 6.324607 101.03498 1000
markus 3.515725 3.535067 4.852606 3.694924 4.908930 84.85184 1000
Upvotes: 2
Reputation: 388817
Another option with expand.grid
is to create pairwise combinations of v
with itself and since you have values of only 0 and 1, we can find values with 0 and 2. (0 + 0 and 1 + 1).
inds <- rowSums(expand.grid(v, v))
matrix(+(inds == 0 | inds == 2), nrow = length(v))
# [,1] [,2] [,3] [,4]
#[1,] 1 0 0 1
#[2,] 0 1 1 0
#[3,] 0 1 1 0
#[4,] 1 0 0 1
Since, the diagonal element are not important for you, I will keep it as it is or if you want to change you can use diag
as shown in @markus's answer.
Upvotes: 2
Reputation: 26343
We can use outer
out <- outer(v, v, `==`)
diag(out) <- 0L # as you don't want to compare each element to itself
out
# [,1] [,2] [,3] [,4]
#[1,] 0 0 0 1
#[2,] 0 0 1 0
#[3,] 0 1 0 0
#[4,] 1 0 0 0
Upvotes: 5