Reputation: 740
I have written a recursive code in R. Before invoking R, I set the stack size to 96 MB at the shell with:
ulimit -s 96000
I invoked R with maximum protection pointer stack size of 500000 with:
R --max-ppsize 500000
And I changed the maximum recursion depth to 500000:
options(expression = 500000)
I both used the binary R package at Arch Linux repositories (without memory profiling) and also a binary compiled by me with memory profiling option. Both are of version 3.4.2
I used two versions of the code with and without gc().
The problem is that R exits the code with "node stack overflow" error while only 16 MB of the 93 MB of total available stack is used and depth is just below one percent of the expressions option of 5e5:
size current direction eval_depth
93388800 16284704 1 4958
Error: node stack overflow
The current stack usage change between the last two iterations were around 10K. The only passed and saved object is a numeric vector of 19 items.
The recursive portion of the code is below:
network_recursive <- function(called)
{
print(Cstack_info())
callers <- list_caller[[called + 1]] # get the callers of the called
callers <- callers[!bool[callers + 1]] # subset for nofriends - new friends
new_friend_no <- length(callers) # number of new friends
print(list(called, callers) )
if (new_friend_no > 0) # if1 still new friends
{
friends <<- friends + new_friend_no # increment friend no
print(friends)
bool[callers + 1] <<- T # toggle friends
sapply(callers, network_recursive) # recurse network control
} # close if1
print("end of recursion")
}
What may be the reason for this stack overflow?
Some notes on the R source code, related to the issue.
The portion of the code that triggers the error is lines 5987-5988 from src/main/eval.c:
5975 #ifdef USE_BINDING_CACHE
5976 if (useCache) {
5977 R_len_t n = LENGTH(constants);
5978 # ifdef CACHE_MAX
5979 if (n > CACHE_MAX) {
5980 n = CACHE_MAX;
5981 smallcache = FALSE;
5982 }
5983 # endif
5984 # ifdef CACHE_ON_STACK
5985 /* initialize binding cache on the stack */
5986 vcache = R_BCNodeStackTop;
5987 if (R_BCNodeStackTop + n > R_BCNodeStackEnd)
5988 nodeStackOverflow();
5989 while (n > 0) {
5990 SETSTACK(0, R_NilValue);
5991 R_BCNodeStackTop++;
5992 n--;
5993 }
5994 # else
5995 /* allocate binding cache and protect on stack */
5996 vcache = allocVector(VECSXP, n);
5997 BCNPUSH(vcache);
5998 # endif
5999 }
6000 #endif
Upvotes: 2
Views: 1646
Reputation: 1061
The node stack has its own limit, which is fixed (defined in Defn.h, R_BCNODESTACKSIZE
). If you have a real example where the limit is too small, please submit a bug report, we could increase it or also add a command line option for it. The "node stack" is used by the byte-code interpreter, which interprets byte-code produced by the byte-code compiler. Cstack_info()
does not display the node stack usage. The node stack is not allocated on the C stack.
Programs based on deep recursion will be very slow in R anyway as function calls are quite expensive. For practical purposes, when a limit related to recursion depth is hit, it might be better to rewrite the program to avoid recursion rather then increasing the limits.
Just as an experiment one might disable the just-in-time compiler and by that reduce the stress on the node stack. It won't be completely eliminated, because some packages are already compiled at installation by default, including base and recommended packages, so e.g. sapply
is compiled. Also, this might on the other hand increase the stress on the recursively eliminated expressions, and the program will run even slower.
Upvotes: 1
Reputation: 11
Off the top of my head, I see that you used options(expression = 500000), but the field in the list returned by "options()" is called 'expressions' (with an s). If you typed it in the way you described in your question, then the 'expressions' field remained at 5000, not the 500000 you intended to set it as. So this might be why you maxed out while only using what you thought was 1% of the stack depth.
Upvotes: 1