RoelPi
RoelPi

Reputation: 23

Maximum columns for cast functions in R

I am trying to cast a data table to a matrix.

The long-format Data Table has some hundreds of thousands of rows and I would like to format them to a wide-format matrix.

I have created the following example:

library(stringi)
library(data.table)
library(reshape2)

x <- 500
test <- data.table(first=stri_rand_strings(x,5),
                   second=stri_rand_strings(x,5),
                   third=runif(x),
                   fourth= runif(x))
testMatrix <- acast(test,first~second,
                    value.var = "third",
                    fun.aggregate = mean,
                    fill=0)

As you increase x, the snippet will produce the following error at some point:

Error in eval(substitute(expr), envir, enclos) : n must be a positive integer
In addition: Warning message: In split_indices(.group, .n) : NAs introduced by coercion to integer range

It's not giving me a memory limit warning. What is happening here? Why is it happening? Are there other limits besides RAM for matrices or cast functions?

Thank you in advance
Roel

Upvotes: 0

Views: 661

Answers (1)

manotheshark
manotheshark

Reputation: 4357

You're likely running out of memory even if that's not the error being displayed. Setting x <- 5e3 creates a matrix with 25 million elements that is 191 MB. Changing x <- 5e4 would create a matrix with 2.5 billion elements and using rough extrapolation would make the matrix around 19 GB.

Upvotes: 1

Related Questions