clink
clink

Reputation: 213

Issue searching through a Trie

I have written a code that implements Trie data structure where it takes in a list of strings and the string's count.

lst = [['james',9],['chloe',20],['chlara',30]]

The strings are the names, and the integer value followed by it is the count. Once the trie is constructed, it prompts a user to enter a prefix,

userinput = ch

With that, the code will return the string chlara since it has a higher count compared to chloe which also has the prefix ch. I've managed to construct the Trie but I'm having difficulties with the search function.

class Node:
    def __init__(self):
        self.children = [None] * 26
        self.end = False
        self.frequency = 0
        self.goindex = 0
        self.index = 0

class Trie:
    def __init__(self):
        self.root = Node()

    def ord_char(self,key):
        ord_rep = ord(key) - ord('a')
        return ord_rep

    def add_word(self,lst):
        word = lst[0]    #word
        freq = lst[1]    #frequency of string

        word_len = len(word)    

        current = self.root    #start from root node

        for i in range(word_len):
            position = self.ord_char(word[i])

            if current.children[position] is None:
                current.children[position] = Node()

            current = current.children[position]

            if current.frequency > freq:
                continue
            else:
                current.frequency = freq
            current.index = position

        current.end = True  #end of string


def main():
    trie = Trie()

    for i in list2:
        trie.add_word(i)
    user = input("Enter a prefix: ")
    print(trie.prefix_search(user))

if __name__ == "__main__":
    main()

I'm being returned the incomplete string 'chla' and I'm pretty sure it's due to my search function being inefficient and not working properly.

updated

Issue I'm facing now is, if my prefix is 'aberr' I'm being returned 'aberration' instead of 'aberr'

Upvotes: 0

Views: 176

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121834

You are never traversing your trie properly. You have two nested for loops, and so only traverse two more nodes (characters) from your prefix.

I'm going to assume you want to return one result, even if there are multiple strings with matching suffix and matching count.

Use a while loop, and keep following the highest count value until you get to a node with no more children with a value that's equal to the count of the current node. Do verify that you end is True for that node, as that would indicate a bug in your word-adding code:

def prefix_search(self, prefix):
    # traverse the prefix
    current = self.root
    for c in prefix:
        current = current.children[self.ord_char(c)]
        if current is None:
            return None  # None is a much better return value than -1

    while current is not None:
        for child in current.children:
            if child is None:
                continue
            if child.count == current.count:
                # found first child with same score, continue from here
                prefix += chr(child.index + 97)
                current = child
                break
        else:
            # no children with equal score, this is the longest match
            assert current.end, "Data structure inconsistent"
            return prefix

Demo:

>>> trie.prefix_search('ch')
'chlara'
>>> trie.prefix_search('j')
'james'

and some of those edge-cases:

>>> trie.add_word(('chlarissa', 9))  # longer word, lower count!
>>> trie.prefix_search('ch')
'chlara'
>>> trie.add_word(('janet', 6))  # same length and score as james, won't be found
>>> trie.prefix_search('j')
'james'

and if there was an error in the data structure; this deliberately sets a wrong count:

>>> trie.root.children[9].children[0].count = 7  # 'ja', 7, so higher, but not an end node
>>> trie.prefix_search('j')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 59, in prefix_search
AssertionError: Data structure inconsistent

Upvotes: 1

Related Questions