Wasabi
Wasabi

Reputation: 3071

Using large hash tables in R

I'm trying to use package hash, which I understand is the most commonly adopted implementation (other than directly using environments).

If I try to create and store hashes larger than ~20MB, I start getting protect(): protection stack overflow errors.

pryr::object_size(hash::hash(1:120000, 1:120000))  # * (see end of post)
#> 21.5 MB
h <- hash::hash(1:120000, 1:120000)
#> Error: protect(): protection stack overflow

If I run the h <- ... command once, the error only appears once. If I run it twice, I get an infinite loop of errors appearing in the console, freezing Rstudio and forcing me to restart it from the Task Manager.

From multiple other SO questions, I understand this means I'm creating more pointers than R can protect. This makes sense to me, since hashes are actually just environments (which themselves are just hash tables), so I assume R needs to keep track of each value in the hash table as a separate pointer.

The common solution I've seen for the protect() error is to use rstudio.exe --max-ppsize=500000 (which I assume propagates that option to R itself), but it doesn't help in this case, the error remains. This is somewhat surprising, since the hash in the example above is only 120,000 keys/pointers long, much smaller than the given ppsize of 500,000.

So, how can I use large hashes in R? I'm assuming changing to pure environments won't help, since hash is really just a wrapper around environments.


* For the record, the given hash::hash() call above will create hashes with non-syntactic names, but that's irrelevant: my real case has simple character keys and integer values and shows the same behavior)

Upvotes: 3

Views: 894

Answers (1)

user2554330
user2554330

Reputation: 44957

This is a bug in RStudio, not a limitation in R. The bug happens when it tries to examine the h object for display in the environment pane. The bug is on their issue list as https://github.com/rstudio/rstudio/issues/5546 .

Upvotes: 4

Related Questions