Reputation: 21
I need to generate a logical matrix in R. The dimension of the matrix is dynamic, both column names and row names of the matrix come from vectors.
vector1 <- c(a,b,c,d)
vector2 <- c(a,c,e,f,g)
vector3 <- c(d,f,g,z)
and so on...
It iterates each vectors, set the vector name as row name. If the vector value is found in matrix column name, set the corresponding matrix cell value to be 1, otherwise add a new column to the matrix and assign value 1 to the cell. The matrix values are either 1/0, it should work like this
a b c d e f g z
vector1 1 1 1 1 0 0 0 0
vector2 1 0 1 0 1 1 1 0
vector3 0 0 0 1 0 1 1 1
It's just a simple demo, actually the size of each vector is very large.
Upvotes: 0
Views: 2644
Reputation: 42544
Although being late to the party, I would like to suggest two different approaches concerning
dcast()
.The OP has disclosed that both column names and row names of the matrix come from vectors and that actually the size of each vector is very large. She has given sample data
vector1 <- c(a,b,c,d)
vector2 <- c(a,c,e,f,g)
vector3 <- c(d,f,g,z)
where the column names are no valid character strings. Each of the column names need to be wrapped in quotes (as done in the other answer) which would be tedious for large vectors.
Therefore, I suggest to store row names rn
and column names cn
of the matrix in a compact and handy form:
rn cn
vector1 a,b,c,d
vector2 a,c,e,f,g
vector3 d,f,g,z
either in a file or in a character string. cn
contains the names of the matrix columns separated by comma.
This "sparse matrix definition" can be read, e.g.,
library(data.table)
sparse <- fread("
rn cn
vector1 a,b,c,d
vector2 a,c,e,f,g
vector3 d,f,g,z
")
This requires two steps. First, the column names needs to be extracted for each row name. This is accomplished by using strsplit()
:
long <- sparse[, strsplit(cn, ","), by = rn]
long
# rn V1
# 1: vector1 a
# 2: vector1 b
# 3: vector1 c
# 4: vector1 d
# 5: vector2 a
# 6: vector2 c
# 7: vector2 e
# 8: vector2 f
# 9: vector2 g
#10: vector3 d
#11: vector3 f
#12: vector3 g
#13: vector3 z
This returns the sparse matrix information in long format. Note that V1
now contains the names of the matrix columns as character saving us from wrapping them in quotes manually.
Now, the OP is expecting the result in wide format with 0
or 1
indicating absence or presence of the respective column. The reshape can be accomplished using dcast()
:
result <- dcast(long, rn ~ V1, length)
result
# rn a b c d e f g z
#1: vector1 1 1 1 1 0 0 0 0
#2: vector2 1 0 1 0 1 1 1 0
#3: vector3 0 0 0 1 0 1 1 1
Or, in a more convoluted form:
result <- dcast(sparse[, strsplit(cn, ","), by = rn], rn ~ V1, length)
Now, the result can be converted from data.table
into a matrix with appropriate row names:
mat <- as.matrix(result[, .SD, .SDcols = -c("rn")])
rownames(mat) <- result[, rn]
mat
# a b c d e f g z
#vector1 1 1 1 1 0 0 0 0
#vector2 1 0 1 0 1 1 1 0
#vector3 0 0 0 1 0 1 1 1
Upvotes: 1
Reputation: 32548
#DATA
vector1 = c("a", "b", "c", "d")
vector2 = c("a", "c", "e", "f", "g")
vector3 = c("d", "f", "g", "z")
#Get all vectors in a list
temp = mget(paste("vector", 1:3, sep = ""))
#You could do sequence(length(ls(pattern = "vector"))) instead of 1:3
#1) As pointed out in the comments by akrun, use `mtabulate` of `qdapTools` package
library(qdapTools)
mtabulate(temp)
# a b c d e f g z
#vector1 1 1 1 1 0 0 0 0
#vector2 1 0 1 0 1 1 1 0
#vector3 0 0 0 1 0 1 1 1
#2) Or if you want to do it in base R
#2-i) as pointed out by akrun
table(stack(temp)[2:1]) #also check data.frame(unclass(table(stack(temp)[2:1])))
# values
#ind a b c d e f g z
# vector1 1 1 1 1 0 0 0 0
# vector2 1 0 1 0 1 1 1 0
# vector3 0 0 0 1 0 1 1 1
#2-ii)
#Get the unique values
temp2 = unique(unlist(temp))
setNames(object = data.frame(do.call(rbind, lapply(temp, function(a)
as.numeric(temp2 %in% a)))),
nm = temp2)
# a b c d e f g z
#vector1 1 1 1 1 0 0 0 0
#vector2 1 0 1 0 1 1 1 0
#vector3 0 0 0 1 0 1 1 1
Upvotes: 4