James Chang
James Chang

Reputation: 630

Collect Suffixes for all Complete Words below a Trie Node (using recursion in Python)

I need to add the ability to list suffixes to implement our autocomplete feature. To do that, I implemented a function on the TrieNode object that will return all complete word suffixes that exist below it in the trie. For example, if our Trie contains the words ["fun", "function", "factory"] and we ask for suffixes from the f node, we would expect to receive ["un", "unction", "actory"] back from node.get_suffixes(). Here is how I get started:

class TrieNode:

    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()

    def insert(self, char):
        ## Add a child node in this Trie
        if not char in self.children:
            self.children[char] = TrieNode()

    def get_suffixes(self):
        pass

I have tested the get_suffixes function separately and it seemed to work fine.

result = []
def get_suffixes(node, suffix=""):
    if not node.children == dict():
        for key in node.children:
            suffix += key
            if node.children[key].word_end:
                result.append(suffix)
            get_suffixes(node.children[key], suffix)
            suffix = suffix[:-1]
    return result

How is how I tested the function:

# Create a mock trie for the test
node = TrieNode()
node.insert("A")
node.children["A"].word_end = True
node.children["A"].insert("t")
node.children["A"].children["t"].word_end = True
node.children["A"].insert("b")
node.children["A"].children["b"].insert("a")
node.children["A"].children["b"].children["a"].insert("c")
node.children["A"].children["b"].children["a"].children["c"].insert("a")
node.children["A"].children["b"].children["a"].children["c"].children["a"].word_end = True
node.children["A"].insert("d")
node.children["A"].children["d"].insert("d")
node.children["A"].children["d"].children["d"].word_end = True
node.children["A"].children["d"].insert("m")
node.children["A"].children["d"].children["m"].insert("i")
node.children["A"].children["d"].children["m"].children["i"].insert("n")
node.children["A"].children["d"].children["m"].children["i"].children["n"].word_end = True

result = []
def get_suffixes(node, suffix=""):
    if not node.children == dict():
        for key in node.children:
            suffix += key
            if node.children[key].word_end:
                result.append(suffix)
            get_suffixes(node.children[key], suffix)
            suffix = suffix[:-1]
    return result

get_suffixes(node.children["A"]) # Returns ['t', 'baca', 'dd', 'dmin'], as expected

The problem occured when I tried moving the get_suffixes function to the TrieNode class. Here I do not know how I should tackle the global variable result. It is not supposed to be a global variable anymore. I have tried two versions:

Version I: make result a class attribute

class TrieNode:

    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()
        self.result = []

    def insert(self, char):
        ## Add a child node in this Trie
        if not char in self.children:
            self.children[char] = TrieNode()

    def get_suffixes(self, suffix=""):
        if not self.children == dict():
            for key in self.children:
                suffix += key
                if self.children[key].word_end:
                    self.result.append(suffix)
                self.children[key].get_suffixes(suffix)
                suffix = suffix[:-1]   
        return self.result 

node.children["A"].get_suffixes() # Returns ['t'], which is wrong

Version II: make result a default function parameter

class TrieNode:

    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()

    def insert(self, char):
        ## Add a child node in this Trie
        if not char in self.children:
            self.children[char] = TrieNode()

    def suffixes(self, suffix="", result=[]):
        if not self.children == dict():
            for key in self.children:
                suffix += key
                if self.children[key].word_end:
                    result.append(suffix)
                self.children[key].suffixes(suffix)
                suffix = suffix[:-1]   
        return result

node.children["A"].suffixes() # Returns ['t', 'baca', 'dd', 'dmin']
node.children["A"].suffixes() # Returns ['t', 'baca', 'dd', 'dmin', 't', 'baca', 'dd', 'dmin']

The result of Version II is not surprising because:

def append(number, number_list=[]):
    number_list.append(number)
    print(number_list)
    return number_list

append(5) # expecting: [5], actual: [5]
append(7) # expecting: [7], actual: [5, 7]
append(2) # expecting: [2], actual: [5, 7, 2]

I am learning algorithms and data structure in Python. I was asked to do it using a recursive function. Other approaches such as Implementing a Trie to support autocomplete in Python are not the answers I expect though they themselves might be able to solve the problem. I am extremely curious why self.result is not properly modified in Version I but works properly if it does not reside in a class.

Upvotes: 2

Views: 522

Answers (1)

asds_asds
asds_asds

Reputation: 1062

result belongs to the class TrieNode.

When you return self.result from the get_suffixes method, you are only including the answers found in the current TrieNode Instance.

You need to include the answers found by its children as well. Thanks to recursion the code just needs a minor change and adding self.result+=self.children[key].get_suffixes(suffix) makes everything work.

class TrieNode:
    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()
        self.result = []

    def insert(self, char):
        ## Add a child node in this Trie
        if not char in self.children:
            self.children[char] = TrieNode()

    def get_suffixes(self, suffix=""):
        if not self.children == dict():
            for key in self.children:
                suffix += key
                if self.children[key].word_end:
                    self.result.append(suffix)
                else:
                    self.result+=self.children[key].get_suffixes(suffix)
                suffix = suffix[:-1]   
        return self.result 



# Create a mock trie for the test
node = TrieNode()
node.insert("A")
node.children["A"].word_end = True
node.children["A"].insert("t")
node.children["A"].children["t"].word_end = True
node.children["A"].insert("b")
node.children["A"].children["b"].insert("a")
node.children["A"].children["b"].children["a"].insert("c")
node.children["A"].children["b"].children["a"].children["c"].insert("a")
node.children["A"].children["b"].children["a"].children["c"].children["a"].word_end = True
node.children["A"].insert("d")
node.children["A"].children["d"].insert("d")
node.children["A"].children["d"].children["d"].word_end = True
node.children["A"].children["d"].insert("m")
node.children["A"].children["d"].children["m"].insert("i")
node.children["A"].children["d"].children["m"].children["i"].insert("n")
node.children["A"].children["d"].children["m"].children["i"].children["n"].word_end = True


print(node.children["A"].get_suffixes())

Output:-

['t', 'baca', 'dd', 'dmin']

The thing to remember is that every child is a new instance of the TrieNode class and thus has its own separate result array.

Modified Insertion + No Result Array:-

class TrieNode:
    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()

    def insert(self, string):
        if len(string) == 0:
            self.word_end = True
            return
        ## Add a child node in this Trie
        if not string[0] in self.children:
            self.children[string[0]] = TrieNode()
        self.children[string[0]].insert(string[1:])

    def get_suffixes(self, suffix=""):
        query_result=[]
        if self.word_end:
            query_result.append(suffix)
        for i in self.children:
            query_result+=self.children[i].get_suffixes(suffix+i)
        return query_result




# Create a mock trie for the test
node = TrieNode()
node.insert("Add")
node.insert("At")
node.insert("Abaca")
node.insert("Admin")

print(node.children["A"].get_suffixes())
print(node.children["A"].get_suffixes())
print(node.children["A"].children["t"].get_suffixes())

Output:-

['dd', 'dmin', 't', 'baca']
['dd', 'dmin', 't', 'baca']
['']
[Finished in 0.0s]

Upvotes: 2

Related Questions