Sarah
Sarah

Reputation: 137

Creating a sparse matrix in R

I have a data frame Likes for n users and m likes, with userid and likeid1 : likeidm as my variables. The specific userids are stored in column 1 (Likes$userid) and the cells contain 1 or 0 depending on wether the user liked the page with the respective likeid or not.

library(Matrix)

Likes <- data.frame(userid=c("n1","n2"),
                      m1=c(0,1),
                      m2=c(0,0),
                      m3=c(0,0),
                      m4=c(1,0)
                      )

Likes [1,1:5]

  userid       m1          m2          m3          m4
1 n1           0           0           0           1

Now, I want to create a sparse matrix. How would I specify j in the following code? I know it is not right the way I did it, since technically like ids are not in a column but already specified as variables in my data frame.

sM_Likes <- sparseMatrix(Likes, i=likes$userid, j=1,c(2:ncol(Likes)), x=1)

Thanks in advance (and please apologize the very basic question).

Upvotes: 1

Views: 384

Answers (1)

Hack-R
Hack-R

Reputation: 23231

I tried to reproduce the problem by constructing an object like you described in the question (which I've now edited into the question) and by appending some additional fake rows to it.

library(Matrix)

Likes <- data.frame(userid=c("n1","n2"),
                      m1=c(0,1),
                      m2=c(0,0),
                      m3=c(0,0),
                      m4=c(1,0)
                      )

I found that running your code on this threw a different error:

sM_Likes <- sparseMatrix(Likes, i=likes$userid, j=1,c(2:ncol(Likes)), x=1)

Error in sparseMatrix(Likes, i = likes$userid, j = 1, c(2:ncol(Likes)), : exactly one of 'i', 'j', or 'p' must be missing from call

I mentioned this a couple of times in the comments as what I thought was causing the problem. You corrected the specification of your j argument and now it works :)

There's also a follow up question you asked in the comments about column names. I think this should solve that:

devtools::install_github("ben519/mltools")
require(mltools)
dt <- data.table(
  intCol=c(1L, NA_integer_, 3L, 0L),
  realCol=c(NA, 2, NA, NA),
  logCol=c(TRUE, FALSE, TRUE, FALSE),
  ofCol=factor(c("a", "b", NA, "b"), levels=c("a", "b", "c"), ordered=TRUE),
  ufCol=factor(c("a", NA, "c", "b"), ordered=FALSE)
)

sparsify(dt)
sparsify(dt, sparsifyNAs=TRUE)
sparsify(dt[, list(realCol)], naCols="identify")
sparsify(dt[, list(realCol)], naCols="efficient")

Upvotes: 2

Related Questions