Joan
Joan

Reputation: 21

How to create a matrix dynamically in R?

I need to generate a logical matrix in R. The dimension of the matrix is dynamic, both column names and row names of the matrix come from vectors.

vector1 <- c(a,b,c,d)
vector2 <- c(a,c,e,f,g)
vector3 <- c(d,f,g,z)

and so on...

It iterates each vectors, set the vector name as row name. If the vector value is found in matrix column name, set the corresponding matrix cell value to be 1, otherwise add a new column to the matrix and assign value 1 to the cell. The matrix values are either 1/0, it should work like this

        a  b  c  d  e  f  g  z
vector1 1  1  1  1  0  0  0  0
vector2 1  0  1  0  1  1  1  0
vector3 0  0  0  1  0  1  1  1

It's just a simple demo, actually the size of each vector is very large.

Upvotes: 0

Views: 2644

Answers (2)

Uwe
Uwe

Reputation: 42544

Although being late to the party, I would like to suggest two different approaches concerning

  • storing and reading of the dynamic, "sparse matrix" information
  • and creating the matrix using dcast().

Storing and reading the dynamic matrix information

The OP has disclosed that both column names and row names of the matrix come from vectors and that actually the size of each vector is very large. She has given sample data

vector1 <- c(a,b,c,d)
vector2 <- c(a,c,e,f,g)
vector3 <- c(d,f,g,z)

where the column names are no valid character strings. Each of the column names need to be wrapped in quotes (as done in the other answer) which would be tedious for large vectors.

Therefore, I suggest to store row names rn and column names cn of the matrix in a compact and handy form:

rn      cn
vector1 a,b,c,d
vector2 a,c,e,f,g
vector3 d,f,g,z

either in a file or in a character string. cn contains the names of the matrix columns separated by comma.

This "sparse matrix definition" can be read, e.g.,

library(data.table)
sparse <- fread("
rn      cn
vector1 a,b,c,d
vector2 a,c,e,f,g
vector3 d,f,g,z
")

Creating the matrix

This requires two steps. First, the column names needs to be extracted for each row name. This is accomplished by using strsplit():

long <- sparse[, strsplit(cn, ","), by = rn]

long
#         rn V1
# 1: vector1  a
# 2: vector1  b
# 3: vector1  c
# 4: vector1  d
# 5: vector2  a
# 6: vector2  c
# 7: vector2  e
# 8: vector2  f
# 9: vector2  g
#10: vector3  d
#11: vector3  f
#12: vector3  g
#13: vector3  z

This returns the sparse matrix information in long format. Note that V1 now contains the names of the matrix columns as character saving us from wrapping them in quotes manually.

Now, the OP is expecting the result in wide format with 0 or 1 indicating absence or presence of the respective column. The reshape can be accomplished using dcast():

result <- dcast(long, rn ~ V1, length)

result
#        rn a b c d e f g z
#1: vector1 1 1 1 1 0 0 0 0
#2: vector2 1 0 1 0 1 1 1 0
#3: vector3 0 0 0 1 0 1 1 1

Or, in a more convoluted form:

result <- dcast(sparse[, strsplit(cn, ","), by = rn], rn ~ V1, length)

Now, the result can be converted from data.table into a matrix with appropriate row names:

mat <- as.matrix(result[, .SD, .SDcols = -c("rn")])
rownames(mat) <- result[, rn]
mat
#        a b c d e f g z
#vector1 1 1 1 1 0 0 0 0
#vector2 1 0 1 0 1 1 1 0
#vector3 0 0 0 1 0 1 1 1

Upvotes: 1

d.b
d.b

Reputation: 32548

#DATA
vector1 = c("a", "b", "c", "d")
vector2 = c("a", "c", "e", "f", "g")
vector3 = c("d", "f", "g", "z")

#Get all vectors in a list
temp = mget(paste("vector", 1:3, sep = ""))
           #You could do sequence(length(ls(pattern = "vector"))) instead of 1:3

#1) As pointed out in the comments by akrun, use `mtabulate` of `qdapTools` package
library(qdapTools)
mtabulate(temp)
#        a b c d e f g z
#vector1 1 1 1 1 0 0 0 0
#vector2 1 0 1 0 1 1 1 0
#vector3 0 0 0 1 0 1 1 1

#2) Or if you want to do it in base R

  #2-i) as pointed out by akrun
  table(stack(temp)[2:1]) #also check data.frame(unclass(table(stack(temp)[2:1])))
  #         values
  #ind       a b c d e f g z
  #  vector1 1 1 1 1 0 0 0 0
  #  vector2 1 0 1 0 1 1 1 0
  #  vector3 0 0 0 1 0 1 1 1

  #2-ii)
  #Get the unique values
  temp2 = unique(unlist(temp))

  setNames(object = data.frame(do.call(rbind, lapply(temp, function(a)
      as.numeric(temp2 %in% a)))),
      nm = temp2)
  #        a b c d e f g z
  #vector1 1 1 1 1 0 0 0 0
  #vector2 1 0 1 0 1 1 1 0
  #vector3 0 0 0 1 0 1 1 1

Upvotes: 4

Related Questions