Reputation: 213
I have written a code that implements Trie data structure where it takes in a list of strings and the string's count.
lst = [['james',9],['chloe',20],['chlara',30]]
The strings are the names, and the integer value followed by it is the count. Once the trie is constructed, it prompts a user to enter a prefix,
userinput = ch
With that, the code will return the string chlara since it has a higher count compared to chloe which also has the prefix ch. I've managed to construct the Trie but I'm having difficulties with the search function.
class Node:
def __init__(self):
self.children = [None] * 26
self.end = False
self.frequency = 0
self.goindex = 0
self.index = 0
class Trie:
def __init__(self):
self.root = Node()
def ord_char(self,key):
ord_rep = ord(key) - ord('a')
return ord_rep
def add_word(self,lst):
word = lst[0] #word
freq = lst[1] #frequency of string
word_len = len(word)
current = self.root #start from root node
for i in range(word_len):
position = self.ord_char(word[i])
if current.children[position] is None:
current.children[position] = Node()
current = current.children[position]
if current.frequency > freq:
continue
else:
current.frequency = freq
current.index = position
current.end = True #end of string
def main():
trie = Trie()
for i in list2:
trie.add_word(i)
user = input("Enter a prefix: ")
print(trie.prefix_search(user))
if __name__ == "__main__":
main()
I'm being returned the incomplete string 'chla' and I'm pretty sure it's due to my search function being inefficient and not working properly.
updated
Issue I'm facing now is, if my prefix is 'aberr' I'm being returned 'aberration' instead of 'aberr'
Upvotes: 0
Views: 176
Reputation: 1121834
You are never traversing your trie properly. You have two nested for
loops, and so only traverse two more nodes (characters) from your prefix.
I'm going to assume you want to return one result, even if there are multiple strings with matching suffix and matching count.
Use a while
loop, and keep following the highest count
value until you get to a node with no more children with a value that's equal to the count of the current node. Do verify that you end
is True for that node, as that would indicate a bug in your word-adding code:
def prefix_search(self, prefix):
# traverse the prefix
current = self.root
for c in prefix:
current = current.children[self.ord_char(c)]
if current is None:
return None # None is a much better return value than -1
while current is not None:
for child in current.children:
if child is None:
continue
if child.count == current.count:
# found first child with same score, continue from here
prefix += chr(child.index + 97)
current = child
break
else:
# no children with equal score, this is the longest match
assert current.end, "Data structure inconsistent"
return prefix
Demo:
>>> trie.prefix_search('ch')
'chlara'
>>> trie.prefix_search('j')
'james'
and some of those edge-cases:
>>> trie.add_word(('chlarissa', 9)) # longer word, lower count!
>>> trie.prefix_search('ch')
'chlara'
>>> trie.add_word(('janet', 6)) # same length and score as james, won't be found
>>> trie.prefix_search('j')
'james'
and if there was an error in the data structure; this deliberately sets a wrong count:
>>> trie.root.children[9].children[0].count = 7 # 'ja', 7, so higher, but not an end node
>>> trie.prefix_search('j')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 59, in prefix_search
AssertionError: Data structure inconsistent
Upvotes: 1