Reputation: 3071
I'm trying to use package hash
, which I understand is the most commonly adopted implementation (other than directly using environments).
If I try to create and store hashes larger than ~20MB, I start getting protect(): protection stack overflow
errors.
pryr::object_size(hash::hash(1:120000, 1:120000)) # * (see end of post)
#> 21.5 MB
h <- hash::hash(1:120000, 1:120000)
#> Error: protect(): protection stack overflow
If I run the h <- ...
command once, the error only appears once. If I run it twice, I get an infinite loop of errors appearing in the console, freezing Rstudio and forcing me to restart it from the Task Manager.
From multiple other SO questions, I understand this means I'm creating more pointers than R can protect. This makes sense to me, since hashes are actually just environments (which themselves are just hash tables), so I assume R needs to keep track of each value in the hash table as a separate pointer.
The common solution I've seen for the protect()
error is to use rstudio.exe --max-ppsize=500000
(which I assume propagates that option to R itself), but it doesn't help in this case, the error remains. This is somewhat surprising, since the hash in the example above is only 120,000 keys/pointers long, much smaller than the given ppsize
of 500,000.
So, how can I use large hashes in R? I'm assuming changing to pure environments won't help, since hash
is really just a wrapper around environments.
* For the record, the given hash::hash() call above will create hashes with non-syntactic names, but that's irrelevant: my real case has simple character keys and integer values and shows the same behavior)
Upvotes: 3
Views: 894
Reputation: 44957
This is a bug in RStudio, not a limitation in R. The bug happens when it tries to examine the h
object for display in the environment pane. The bug is on their issue list as https://github.com/rstudio/rstudio/issues/5546 .
Upvotes: 4