Reputation: 1584
I have a fun challenge: I'm trying to construct a a binary matrix from an integer vector. The binary matrix should contain as many rows as the length of vector, and as many columns as the max value in the integer vector. The ith row in the matrix will correspond to the ith element of the vector, with the row containing a 1 at the position j, where j is equal to the value of the ith element of the vector; otherwise, the row contains zeros. If the value of the ith integer is 0, then the whole ith row should be 0.
To make this a whole lot simpler, here is a working reproducible example:
set.seed(1)
playv<-sample(0:5,20,replace=TRUE)#sample integer vector
playmat<-matrix(playv,nrow=length(playv),ncol=max(playv))#create matrix from vector
for (i in 1:length(playv)){
pos<-as.integer(playmat[i,1])
playmat[i,pos]<-1
playmat[i,-pos]<-0}
head(playmat)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 0 1 0 0 0
[3,] 0 0 1 0 0
[4,] 0 0 0 0 1
[5,] 1 0 0 0 0
[6,] 0 0 0 0 1
The above solution is correct, I'm just looking to make something more robust.
Upvotes: 1
Views: 1235
Reputation: 193507
You can, of course, also just use table
:
> table(sequence(length(playv)), playv)
playv
0 1 2 3 4 5
1 0 1 0 0 0 0
2 0 0 1 0 0 0
3 0 0 0 1 0 0
4 0 0 0 0 0 1
5 0 1 0 0 0 0
6 0 0 0 0 0 1
7 0 0 0 0 0 1
8 0 0 0 1 0 0
9 0 0 0 1 0 0
10 1 0 0 0 0 0
11 0 1 0 0 0 0
12 0 1 0 0 0 0
13 0 0 0 0 1 0
14 0 0 1 0 0 0
15 0 0 0 0 1 0
16 0 0 1 0 0 0
17 0 0 0 0 1 0
18 0 0 0 0 0 1
19 0 0 1 0 0 0
20 0 0 0 0 1 0
If speed is a concern, I would suggest a manual approach. First, identify the unique values in your vector. Second, create an empty matrix to fill in. Third, use matrix indexing to identify the positions that should be filled in as 1.
Like this:
f3 <- function(vec) {
U <- sort(unique(vec))
M <- matrix(0, nrow = length(vec),
ncol = length(U),
dimnames = list(NULL, U))
M[cbind(seq_len(length(vec)), match(vec, U))] <- 1L
M
}
Usage would be f3(playv)
.
Adding that into the benchmarks, we get:
library(microbenchmark)
microbenchmark(f1(v), f2(v), f3(v), times = 10)
# Unit: milliseconds
# expr min lq median uq max neval
# f1(v) 2104.4808 3151.4308 3314.8173 3344.6696 4023.5246 10
# f2(v) 3956.5678 4782.7863 5994.4448 6320.1901 6646.0405 10
# f3(v) 486.4406 574.1133 746.9112 927.3407 987.9121 10
Upvotes: 4
Reputation: 4123
set.seed(1)
playv <- sample(0:5,20,replace=TRUE)
playv <- as.character(playv)
results <- model.matrix(~playv-1)
The columns in result
you may rename.
I like the solution provided by Ananda Mahto and compared it to model.matrix
. Here is a code
library(microbenchmark)
set.seed(1)
v <- sample(1:10,1e6,replace=TRUE)
f1 <- function(vec) {
vec <- as.character(vec)
model.matrix(~vec-1)
}
f2 <- function(vec) {
table(sequence(length(vec)), vec)
}
microbenchmark(f1(v), f2(v), times=10)
model.matrix
was a little bit faster then table
Unit: seconds
expr min lq median uq max neval
f1(v) 2.890084 3.147535 3.296186 3.377536 3.667843 10
f2(v) 4.824832 5.625541 5.757534 5.918329 5.966332 10
Upvotes: 4