Khris
Khris

Reputation: 3212

Increase recursion limit and stack size in python 2.7

I'm working with large trees and need to increase the recursion limit on Python 2.7.

Using sys.setrecursionlimit(10000) crashes my kernel, so I figured I needed to increase the stack size.

However I don't know how large the stack size should be. I tried 100 MiB like this threading.stack_size(104857600), but the kernel still dies. Giving it 1 GiB throws an error.

I haven't worked with the threading module yet so am I using it wrong when I just put the above statement at the beginning of my script? I'm not doing any kind of parallel processing, everything is done in the same thread.

My computer has 128 GB of physical RAM, running Windows 10, iPython console in Spyder.

The error displayed is simply:

Kernel died, restarting

Nothing more.

EDIT:

Full code to reproduce the problem. The building of the tree works well thought it takes quite long, the kernel only dies during the recursive execution of treeToDict() when reading the whole tree into a dictionary. Maybe there is something wrong with the code of that function. The tree is a non-binary tree:

import pandas as pd
import threading
import sys
import random as rd
import itertools as it
import string
threading.stack_size(104857600)
sys.setrecursionlimit(10000)

class treenode:
    # class to build the tree
    def __init__(self,children,name='',weight=0,parent=None,depth=0):
        self.name = name
        self.weight = weight
        self.children = children
        self.parent = parent
        self.depth = depth
        self.parentname = parent.name if parent is not None else ''

def add_child(node,name):
    # add element to the tree
    # if it already exists at the given node increase weight
    # else add a new child
    for i in range(len(node.children)):
        if node.children[i].name == name:
            node.children[i].weight += 1
            newTree = node.children[i]
            break
    else:
        newTree = treenode([],name=name,weight=1,parent=node,depth=node.depth+1)
        node.children.append(newTree)
    return newTree

def treeToDict(t,data):
    # read the tree into a dictionary
    if t.children != []:
        for i in range(len(t.children)):
            data[str(t.depth)+'_'+t.name] = [t.name, t.children[i].name, t.depth, t.weight, t.parentname]
    else:
        data[str(t.depth)+'_'+t.name] = [t.name, '', t.depth, t.weight, t.parentname]
    for i in range(len(t.children)):
        treeToDict(t.children[i],data)

# Create random dataset that leads to very long tree branches:
# A is an index for each set of data B which becomes one branch
rd.seed(23)
testSet = [''.join(l) for l in it.combinations(string.ascii_uppercase[:20],2)]
A = []
B = []
for i in range(10):
    for j in range(rd.randint(10,6000)):
        A.append(i)
        B.append(rd.choice(testSet))
dd = {"A":A,"B":B}
data = pd.DataFrame(dd)
# The maximum length should be above 5500, use another seed if it's not:
print data.groupby('A').count().max()

# Create the tree
root = treenode([],name='0')
for i in range(len(data.values)):
    if i == 0:
        newTree = add_child(root,data.values[i,1])
        oldses = data.values[i,0]
    else:
        if data.values[i,0] == oldses:
            newTree = add_child(newTree,data.values[i,1])
        else:
            newTree = add_child(root,data.values[i,1])
            oldses = data.values[i,0]

result={}
treeToDict(root,result)

PS: I'm aware the treeToDict() function is faulty in that it will overwrite entries because there can be duplicate keys. For this error this bug is unimportant however.

Upvotes: 2

Views: 1883

Answers (1)

CaptainTrunky
CaptainTrunky

Reputation: 1707

To my experience you have a problem not with stack size, but with an algorithm itself.

It's possible to implement tree traversal procedure without recursion at all. You should implement stack-based depth/breadth first search algorithm. Python-like pseudo-code might look like this:

stack = []
def traverse_tree(root):
  stack.append(root)
  while stack:
    cur = stack.pop()
    cur.do_some_awesome_stuff()
    stack.append(cur.get_children())

This approach is incredibly scalable and allows you to deal with any trees.

As further reading you can try this and that.

Upvotes: 4

Related Questions