Reputation: 11
So the scenario is someone you know gives you a huffman tree but its not optimal (I know all huffman trees are optimal, just if it were hypothetically not optimal but does follow the huffman style of only leaves having values).
The function should improve the tree as much as possible without changing the actual 'shape' of it with the aid of a dictionary mapping the each symbol to the number of occurrences it has in a hypothetical text you are compressing. The function does this by swapping nodes. So the end result won't necessarily be an optimal tree but it will be improved as much as possible. For example....
Class Node:
def __init__(self, item = None, left = None, right = None):
self.item = item
self.left = left
self.right = right
def __repr__(self):
return 'Node({}, {}, {})'.format(self.item, self.left, self.right)
dictionary = {54: 12, 101: 34, 29: 22, 65: 3, 20: 13}
Your friend gives you...
Node(None, Node(None, Node(20), Node(54)), Node(None, Node(65), Node(None, Node(101), Node(29)))
or...
None
/ | \
None | None
/ \ | / \
20 54 | 65 None
| / \
| 101 29
Where the wanted result would be...
Node(None, Node(None, Node(20), Node(29)), Node(None, Node(101), Node(None, Node(65), Node(54)))
or...
None
/ | \
None | None
/ \ | / \
20 29 | 101 None
| / \
| 65 54
How do I locate a leaf node then locate where it's supposed to be, swap it, then do that for all other leaf nodes while also making sure that the shape of the tree is the same, regardless of if it's optimal or not? Also this is in python.
Upvotes: 1
Views: 730
Reputation: 399
From the basic technique of constructing Huffman Trees, the nodes whose value are least probable are the first ones to be linked to a parent node. Those nodes appear deeper within the Huffman Trees than any other nodes in them. From this, we can deduce the fact that the deeper within the tree you go, the less frequent values you would encounter.
This analogy is crucial into developing an optimization function, since we don't need to perform all sorts of swapping when we can get it right the first time by: getting a list of all the items in the tree sorted by depth and their matched values in order; and insert them in their respective depths whenever there are leaves. Here's the solution that I coded:
def optimize_tree(tree, dictionary):
def grab_items(tree):
if tree.item:
return [tree.item]
else:
return grab_items(tree.left) + grab_items(tree.right)
def grab_depth_info(tree):
def _grab_depth_info(tree,depth):
if tree.item:
return {depth:1}
else:
depth_info_list = [_grab_depth_info(child,depth+1) for child in [tree.left, tree.right]]
depth_info = depth_info_list[0]
for depth in depth_info_list[1]:
if depth in depth_info:
depth_info[depth] += depth_info_list[1][depth]
else:
depth_info[depth] = depth_info_list[1][depth]
return depth_info
return _grab_depth_info(tree,0)
def make_inverse_dictionary(dictionary):
inv_dictionary = {}
for key in dictionary:
if dictionary[key] in inv_dictionary:
inv_dictionary[dictionary[key]].append(key)
else:
inv_dictionary[dictionary[key]] = [key]
for key in inv_dictionary:
inv_dictionary[key].sort()
return inv_dictionary
def get_depth_to_items(depth_info,actual_values):
depth_to_items = {}
for depth in depth_info:
depth_to_items[depth] = []
for i in range(depth_info[depth]):
depth_to_items[depth].append(actual_values[i])
depth_to_items[depth].sort()
del actual_values[:depth+1]
return depth_to_items
def update_tree(tree,depth_to_items,reference):
def _update_tree(tree,depth,depth_to_items,reference):
if tree.item:
tree.item = reference[depth_to_items[depth].pop(0)].pop(0)
else:
for child in [tree.left,tree.right]:
_update_tree(child,depth+1,depth_to_items,reference)
_update_tree(tree,0,depth_to_items,reference)
items = grab_items(tree)
depth_info = grab_depth_info(tree)
actual_values = [dictionary[item] for item in items]
actual_values.sort(reverse=True)
inv_dictionary = make_inverse_dictionary(dictionary)
depth_to_items = get_depth_to_items(depth_info,actual_values)
update_tree(tree,depth_to_items,inv_dictionary)
The optimize_tree
function requires the user to pass in two arguments:
tree
: the root node of the Huffman Tree.dictionary
: the dictionary that maps the symbols to their frequency.The function starts off by defining four inner functions:
grab_items
is a function that takes in a tree and returns a list of all the items in it.grab_depth_info
returns a dictionary where the keys are the depth levels and the values are the number of nodes at the level.make_inverse_dictionary
returns a dictionary that is the inverse of the given dictionary. (It can handle cases where the values can be mapped to two keys.)get_depth_to_items
returns a dictionary where they keys are the depth levels and the values are lists of actual values (from the dictionary) that are supposed to be at that level in order for the tree to be optimized.update_tree
inserts the items where they are supposed to be in order to make the tree optimized.Note: grab_depth_info
and update_tree
has an inner function defined in them so that their functionality can work recursively.
These four inner functions are needed for the following algorithm:
get_depth_to_items
function to get a dictionary of depth level to list of values of inorder.update_tree
function which will use its inner function to recursively go to every node in the tree and update the item attribute using the original keys, from the inverted dictionary.The result of using this algorithm will make the tree that you passed in be in its most optimized state, without changing the actual shape of it.
I can confirm that this works by executing these lines of code:
tree = Node(None, Node(None, Node(20), Node(29)), Node(None, Node(101), Node(None, Node(65), Node(54))))
dictionary = {54: 12, 101: 34, 29: 22, 65: 3, 20: 13}
optimize_tree(tree,dictionary)
print(tree)
And the output of this is:
Node(None, Node(None, Node(20, None, None), Node(29, None, None)), Node(None, Node(101, None, None), Node(None, Node(65, None, None), Node(54, None, None))))
Upvotes: 1