Ben Rollert
Ben Rollert

Reputation: 1584

Cleaner way of constructing binary matrix from vector

I have a fun challenge: I'm trying to construct a a binary matrix from an integer vector. The binary matrix should contain as many rows as the length of vector, and as many columns as the max value in the integer vector. The ith row in the matrix will correspond to the ith element of the vector, with the row containing a 1 at the position j, where j is equal to the value of the ith element of the vector; otherwise, the row contains zeros. If the value of the ith integer is 0, then the whole ith row should be 0.

To make this a whole lot simpler, here is a working reproducible example:

set.seed(1)
playv<-sample(0:5,20,replace=TRUE)#sample integer vector

playmat<-matrix(playv,nrow=length(playv),ncol=max(playv))#create matrix from vector

for (i in 1:length(playv)){
pos<-as.integer(playmat[i,1])
playmat[i,pos]<-1
playmat[i,-pos]<-0}

    head(playmat)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    1    0    0    0
[3,]    0    0    1    0    0
[4,]    0    0    0    0    1
[5,]    1    0    0    0    0
[6,]    0    0    0    0    1

The above solution is correct, I'm just looking to make something more robust.

Upvotes: 1

Views: 1235

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

You can, of course, also just use table:

> table(sequence(length(playv)), playv)
    playv
     0 1 2 3 4 5
  1  0 1 0 0 0 0
  2  0 0 1 0 0 0
  3  0 0 0 1 0 0
  4  0 0 0 0 0 1
  5  0 1 0 0 0 0
  6  0 0 0 0 0 1
  7  0 0 0 0 0 1
  8  0 0 0 1 0 0
  9  0 0 0 1 0 0
  10 1 0 0 0 0 0
  11 0 1 0 0 0 0
  12 0 1 0 0 0 0
  13 0 0 0 0 1 0
  14 0 0 1 0 0 0
  15 0 0 0 0 1 0
  16 0 0 1 0 0 0
  17 0 0 0 0 1 0
  18 0 0 0 0 0 1
  19 0 0 1 0 0 0
  20 0 0 0 0 1 0

If speed is a concern, I would suggest a manual approach. First, identify the unique values in your vector. Second, create an empty matrix to fill in. Third, use matrix indexing to identify the positions that should be filled in as 1.

Like this:

f3 <- function(vec) {
  U <- sort(unique(vec))
  M <- matrix(0, nrow = length(vec), 
              ncol = length(U), 
              dimnames = list(NULL, U))
  M[cbind(seq_len(length(vec)), match(vec, U))] <- 1L
  M
}

Usage would be f3(playv).

Adding that into the benchmarks, we get:

library(microbenchmark)
microbenchmark(f1(v), f2(v), f3(v), times = 10)
# Unit: milliseconds
#   expr       min        lq    median        uq       max neval
#  f1(v) 2104.4808 3151.4308 3314.8173 3344.6696 4023.5246    10
#  f2(v) 3956.5678 4782.7863 5994.4448 6320.1901 6646.0405    10
#  f3(v)  486.4406  574.1133  746.9112  927.3407  987.9121    10

Upvotes: 4

DrDom
DrDom

Reputation: 4123

set.seed(1)
playv <- sample(0:5,20,replace=TRUE)
playv <- as.character(playv)
results <- model.matrix(~playv-1)

The columns in result you may rename.

I like the solution provided by Ananda Mahto and compared it to model.matrix. Here is a code

library(microbenchmark)

set.seed(1)
v <- sample(1:10,1e6,replace=TRUE)

f1 <- function(vec) {
  vec <- as.character(vec)
  model.matrix(~vec-1)
}

f2 <- function(vec) {
  table(sequence(length(vec)), vec)
}

microbenchmark(f1(v), f2(v), times=10)

model.matrix was a little bit faster then table

Unit: seconds
  expr      min       lq   median       uq      max neval
 f1(v) 2.890084 3.147535 3.296186 3.377536 3.667843    10
 f2(v) 4.824832 5.625541 5.757534 5.918329 5.966332    10

Upvotes: 4

Related Questions