R - how to convert long-data dataframe to sparse matrix

Question

I have a large dataframe, 305k rows, with a two keys and a data column as follows:

I am trying to convert this to a sparse matrix using the following code in R:

#convert to factors
data$RID   = as.factor(data$RID)
data$HID   = as.factor(data$HID)
data$VALUE = as.numeric(data$VALUE)
str(data)

#remove nas
data = na.omit(data)

#create sparse matrix
X = with(data,sparseMatrix(i=RID, 
                           j=HID, 
                           x=VALUE,
                           dimnames=list(levels(RID), levels(HID))))

Which is producing the following error message:

Error in sparseMatrix(i = RID, j = HID, x = VALUE, dimnames = list(levels(RID),  : 
  NA's in (i,j) are not allowed
In addition: Warning messages:
1: In Ops.factor(i, !(m.i || i1)) : ‘+’ not meaningful for factors
2: In Ops.factor(j, !(m.j || i1)) : ‘+’ not meaningful for factors

I have removed NAs so i'm unsure why the error-NAs is appearing? It also has reference to '+'s within the factors but i have checked all 36k factors and there are no '+'s there?

Does anyone know what the solution is?

I have included a snapshot of the first 20 rows of data below so you can re-produce the issue:

"RID" "HID" "VALUE"
"361838" "620631" 76.55
"361838" "620671" 82.61
"361838" "620787" 57.73
"361838" "621146" 58.65
"361838" "637825" 64.15
"361838" "637859" 82.79
"361838" "641254" 50.38
"361838" "642105" 72.88
"361838" "646469" 45.79
"361838" "648400" 82.06
"395855" "301340" -5.12
"395855" "649304" 41.88
"395855" "650324" -30.83
"395855" "657458" 46.47
"395855" "658028" -0.53
"395855" "659504" 28.84
"395855" "660506" 29.03
"395855" "660519" 14.16
"395855" "660521" -38.17
"395855" "660547" 35.45

Although when i look at the factors, i get the following:

> str(data)
'data.frame':   20 obs. of  3 variables:
 $ RID  : Factor w/ 30608 levels "361838","395855",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ HID  : Factor w/ 37399 levels "2018","7990",..: 11604 11624 11709 11740 14031 14049 15086 15457 16821 17270 ...
 $ VALUE: num  76.5 82.6 57.7 58.6 64.2 ...

R - how to convert long-data dataframe to sparse matrix

Answers (1)

Related Questions