Optimizing a binary tree function, huffman trees

So the scenario is someone you know gives you a huffman tree but its not optimal (I know all huffman trees are optimal, just if it were hypothetically not optimal but does follow the huffman style of only leaves having values).

The function should improve the tree as much as possible without changing the actual 'shape' of it with the aid of a dictionary mapping the each symbol to the number of occurrences it has in a hypothetical text you are compressing. The function does this by swapping nodes. So the end result won't necessarily be an optimal tree but it will be improved as much as possible. For example....

Class Node:
    def __init__(self, item = None, left = None, right = None):
        self.item = item
        self.left = left
        self.right = right

     def __repr__(self):
         return 'Node({}, {}, {})'.format(self.item, self.left, self.right)

dictionary = {54: 12, 101: 34, 29: 22, 65: 3, 20: 13}

Your friend gives you...

Node(None, Node(None, Node(20), Node(54)), Node(None, Node(65), Node(None, Node(101), Node(29)))

or...

               None  
          /     |     \
     None       |       None
   /      \     |     /      \
20          54  |  65       None
                |         /      \
                |      101        29

Where the wanted result would be...

Node(None, Node(None, Node(20), Node(29)), Node(None, Node(101), Node(None, Node(65), Node(54)))

or...

               None  
          /     |     \
     None       |       None
   /      \     |     /      \
20          29  |  101       None
                |         /      \
                |       65        54

How do I locate a leaf node then locate where it's supposed to be, swap it, then do that for all other leaf nodes while also making sure that the shape of the tree is the same, regardless of if it's optimal or not? Also this is in python.

Upvotes: 1

Answers (1)

Jasmeet Brar

Reputation: 399

From the basic technique of constructing Huffman Trees, the nodes whose value are least probable are the first ones to be linked to a parent node. Those nodes appear deeper within the Huffman Trees than any other nodes in them. From this, we can deduce the fact that the deeper within the tree you go, the less frequent values you would encounter.

This analogy is crucial into developing an optimization function, since we don't need to perform all sorts of swapping when we can get it right the first time by: getting a list of all the items in the tree sorted by depth and their matched values in order; and insert them in their respective depths whenever there are leaves. Here's the solution that I coded:

def optimize_tree(tree, dictionary):

    def grab_items(tree):
        if tree.item:
            return [tree.item]
        else:
            return grab_items(tree.left) + grab_items(tree.right)

    def grab_depth_info(tree):
        def _grab_depth_info(tree,depth):
            if tree.item:
                return {depth:1}
            else:
                depth_info_list = [_grab_depth_info(child,depth+1) for child in [tree.left, tree.right]]
                depth_info = depth_info_list[0]
                for depth in depth_info_list[1]:
                    if depth in depth_info:
                        depth_info[depth] += depth_info_list[1][depth]
                    else:
                        depth_info[depth] = depth_info_list[1][depth]
                return depth_info

        return _grab_depth_info(tree,0)

    def make_inverse_dictionary(dictionary):
        inv_dictionary = {}
        for key in dictionary:
            if dictionary[key] in inv_dictionary:
                inv_dictionary[dictionary[key]].append(key)
            else:
                inv_dictionary[dictionary[key]] = [key]

        for key in inv_dictionary:
            inv_dictionary[key].sort()

        return inv_dictionary

    def get_depth_to_items(depth_info,actual_values):
        depth_to_items = {}
        for depth in depth_info:
            depth_to_items[depth] = []
            for i in range(depth_info[depth]):
                depth_to_items[depth].append(actual_values[i])

            depth_to_items[depth].sort()
            del actual_values[:depth+1]

        return depth_to_items

    def update_tree(tree,depth_to_items,reference):
        def _update_tree(tree,depth,depth_to_items,reference):
            if tree.item:
                tree.item = reference[depth_to_items[depth].pop(0)].pop(0)
            else:
                for child in [tree.left,tree.right]:
                    _update_tree(child,depth+1,depth_to_items,reference)
        _update_tree(tree,0,depth_to_items,reference)

    items = grab_items(tree)
    depth_info = grab_depth_info(tree)
    actual_values = [dictionary[item] for item in items]
    actual_values.sort(reverse=True)
    inv_dictionary = make_inverse_dictionary(dictionary)

    depth_to_items = get_depth_to_items(depth_info,actual_values)

    update_tree(tree,depth_to_items,inv_dictionary)

Explanation:

The optimize_tree function requires the user to pass in two arguments:

tree: the root node of the Huffman Tree.
dictionary: the dictionary that maps the symbols to their frequency.

The function starts off by defining four inner functions:

grab_items is a function that takes in a tree and returns a list of all the items in it.
grab_depth_info returns a dictionary where the keys are the depth levels and the values are the number of nodes at the level.
make_inverse_dictionary returns a dictionary that is the inverse of the given dictionary. (It can handle cases where the values can be mapped to two keys.)
get_depth_to_items returns a dictionary where they keys are the depth levels and the values are lists of actual values (from the dictionary) that are supposed to be at that level in order for the tree to be optimized.
update_tree inserts the items where they are supposed to be in order to make the tree optimized.

Note: grab_depth_info and update_tree has an inner function defined in them so that their functionality can work recursively.

These four inner functions are needed for the following algorithm:

First the function grabs a list of items and the depth information from the tree.
Then it uses the list of items to grab the list of actual values from the given dictionary, and have it in descending order. (So that least frequent values are matched with the greatest depth level in step 4.)
Next it makes an inverse of the given dictionary, where the keys and values are swapped. (This is to help with step 5.)
After making those preparations, the function will then pass the depth info and the list of actual values into the get_depth_to_items function to get a dictionary of depth level to list of values of inorder.
Finally, the function passes in the tree, the dictionary that was made in the previous step, and the inverted dictionary into the update_tree function which will use its inner function to recursively go to every node in the tree and update the item attribute using the original keys, from the inverted dictionary.

The result of using this algorithm will make the tree that you passed in be in its most optimized state, without changing the actual shape of it.

I can confirm that this works by executing these lines of code:

tree = Node(None, Node(None, Node(20), Node(29)), Node(None, Node(101), Node(None, Node(65), Node(54))))
dictionary = {54: 12, 101: 34, 29: 22, 65: 3, 20: 13}
optimize_tree(tree,dictionary)
print(tree)

And the output of this is:

Node(None, Node(None, Node(20, None, None), Node(29, None, None)), Node(None, Node(101, None, None), Node(None, Node(65, None, None), Node(54, None, None))))

Upvotes: 1

Optimizing a binary tree function, huffman trees

Answers (1)

Explanation:

Related Questions