Gibson Gay
Gibson Gay

Reputation: 163

Convert data from data.table to matrix efficiently (speed and memory)

I have a ~20,000x20,000 data, how do i convert the from data.table() to a matrix efficiently in terms of speed and memory?

I tried m = as.matrix(dt) but it takes very long with many warnings. df = data.frame(dt) takes very long and result in reaching memory limits as well.

Is there any efficient way to do this? Or, simply a function in data.table which returns dt as as matrix form(as required to feed into a statistical model using the glmnet package)?

Simply wrapping into as.matrix gives me below error:

x = as.matrix(dt)

Error: cannot allocate vector of size 2.9 Gb
In addition: Warning messages:
  1: In unlist(X, recursive = FALSE, use.names = FALSE) : Reached total allocation of 8131Mb: see help(memory.size)
  2: In unlist(X, recursive = FALSE, use.names = FALSE) : Reached total allocation of 8131Mb: see help(memory.size)
  3: In unlist(X, recursive = FALSE, use.names = FALSE) : Reached total allocation of 8131Mb: see help(memory.size)
  4: In unlist(X, recursive = FALSE, use.names = FALSE) : Reached total allocation of 8131Mb: see help(memory.size)

My OS: I have 64 bit Windows7 and 8gb ram, my Windows task manager shows Rgui.exe taking up spaces more than 4gb before and were still fine though.

Upvotes: 16

Views: 10940

Answers (2)

zx8754
zx8754

Reputation: 56259

@GibsonGay:

I have made an error on my part to include the character column into the matrix, which elevated the matrix's class to character for all columns. Removing this column allowed a integer matrix to be made and it converted successfully without errors/warnings and ran the model fine.

Upvotes: 2

P. Denelle
P. Denelle

Reputation: 830

Try:

    result <- as.matrix(tidytext::cast_sparse(dat_table,
    column_name_of_rows,
    column_name_of_columns,
    column_name_of_values))

It should be very efficient and fast.

Upvotes: 4

Related Questions