Reputation: 163
I have a ~20,000x20,000 data, how do i convert the from data.table()
to a matrix
efficiently in terms of speed and memory?
I tried m = as.matrix(dt)
but it takes very long with many warnings. df = data.frame(dt)
takes very long and result in reaching memory limits as well.
Is there any efficient way to do this? Or, simply a function in data.table which returns dt
as as matrix form(as required to feed into a statistical model using the glmnet
package)?
Simply wrapping into as.matrix gives me below error:
x = as.matrix(dt)
Error: cannot allocate vector of size 2.9 Gb
In addition: Warning messages:
1: In unlist(X, recursive = FALSE, use.names = FALSE) : Reached total allocation of 8131Mb: see help(memory.size)
2: In unlist(X, recursive = FALSE, use.names = FALSE) : Reached total allocation of 8131Mb: see help(memory.size)
3: In unlist(X, recursive = FALSE, use.names = FALSE) : Reached total allocation of 8131Mb: see help(memory.size)
4: In unlist(X, recursive = FALSE, use.names = FALSE) : Reached total allocation of 8131Mb: see help(memory.size)
My OS: I have 64 bit Windows7 and 8gb ram, my Windows task manager shows Rgui.exe taking up spaces more than 4gb before and were still fine though.
Upvotes: 16
Views: 10940
Reputation: 56259
@GibsonGay:
I have made an error on my part to include the character column into the matrix, which elevated the matrix's class to character for all columns. Removing this column allowed a integer matrix to be made and it converted successfully without errors/warnings and ran the model fine.
Upvotes: 2
Reputation: 830
Try:
result <- as.matrix(tidytext::cast_sparse(dat_table,
column_name_of_rows,
column_name_of_columns,
column_name_of_values))
It should be very efficient and fast.
Upvotes: 4