wcarhart
wcarhart

Reputation: 2773

How to flatten a tree by "bubbling up" values of each subtree?

I have the following tree data structure:

class TreeNode:
    def __init__(self, path, value=None):
        self.path = path
        self.value = value
        self.children = []

Where path is a directory or filename. If the the value of path is a file (a leaf in the tree), then the value is an integer, otherwise it is None. Here's an example of this tree structure:

└── root (None)
    ├── dir0 (None)
    |   ├── dir00 (None)
    |   |   └── file000.txt (10)
    |   └── file00.txt (10)
    ├── dir1 (None)
    |   └── file10.txt (5)
    ├── dir2 (None)
    |   ├── file20.txt (10)
    |   └── file21.txt (15)
    └── dir3 (None)
        ├── dir30 (None)
        |   └── file300.txt (15)
        └── file30.txt (10)

I am trying to return the smallest possible flattened list of resolved paths and their associated value. If all of the nodes in a subtree have the same value, then we say that such a subtree has the same value as its nodes. Essentially, the value for each node bubbles up to its parent if all children of the parent have the same value.

For example, what should be returned with the above tree is:

/root/dir0: 10
/root/dir1: 5
/root/dir2/file20.txt: 10
/root/dir2/file21.txt: 15
/root/dir3/dir30: 15
/root/dir3/file30.txt: 10

I've tried a couple different ways to accomplish this: traversing the tree with a stack, traversing the tree with recursion, and using sets; all have been unsuccessful. My most recent attempt's pseudocode looks like:

def build_list(self, treenode):
    if treenode.value:
        return [(treenode.path, treenode.value)]
    if treenode.value == None:
        s = set()
        for child in treenode.children:
            potential_values = self.build_list(child)
            for val in potential_values:
                s |= {val[1]}
        if len(s) == 1:
            return [(treenode.path, treenode.value)]
        else:
            return [(child.path, child.value) for child in treenode.children]

How would I accomplish this? Pseudocode is totally fine, I'm looking for an approach, not necessarily a full implementation.

Upvotes: 1

Views: 780

Answers (2)

dizzy77
dizzy77

Reputation: 543

Here's a way to do it. I'm using a different node class to make things a little easier to setup:

class TreeNode:

    def __init__(self, node_id, value=None):
        self.node_id = node_id
        self.value = value
        self.children = []

    def addChildren(self, *node_list):
        self.children += node_list

    def __repr__(self):
        return f'<TreeNode(node_id={self.node_id}, value={self.value})'

The __repr__() is just there to get a human-readable representation of an instance when printing.

I attempted to set up the tree to mirror your example, but feel free to let me know if it's not exactly right. (It could be done more succinctly, but I've written one line per object for clarity.)

root = TreeNode('root')

dir0 = TreeNode('dir0')
dir00 = TreeNode('dir00')
file00 = TreeNode('file00', 10)
file000 = TreeNode('file000', 10)

dir1 = TreeNode('dir1')
file10 = TreeNode('file10', 5)

dir2 = TreeNode('dir2')
file20 = TreeNode('file20', 10)
file21 = TreeNode('file21', 15)

dir3 = TreeNode('dir3')
dir30 = TreeNode('dir30')
file30 = TreeNode('file30', 10)
file300 = TreeNode('file300', 15)

root.addChildren(dir0, dir1, dir2, dir3)

dir0.addChildren(dir00, file00)
dir00.addChildren(file000)

dir1.addChildren(file10)

dir2.addChildren(file20, file21)

dir3.addChildren(dir30, file30)
dir30.addChildren(file300)

Now for the recusive function:

def buildList(node):
    # Return node id and value if no children
    if not node.children:
        return [(node.node_id, node.value)]

    # Call buildList on each child and get distinct values
    childitems = [item for child in node.children for item in buildList(child)]
    childvalues = set(childitem[1] for childitem in childitems)

    # If value is unique, return this node as if it has the unique value
    if len(childvalues) == 1:
        return [(node.node_id, childvalues.pop())]

    # Otherwise, return all results
    return childitems

This produces the following output:

>>> buildList(root)

[('dir0', 10),
 ('dir1', 5),
 ('file20', 10),
 ('file21', 15),
 ('dir30', 15),
 ('file30', 10)]

[Edit] Note that this returns a list and doesn't mutate the nodes.

Regarding your particular implementation, you might want to think about what happens if there's an empty directory. Is that allowed? If so, it's a leaf but not a file. Will it have a value in that case?

Upvotes: 0

Tim
Tim

Reputation: 3407

Method 1

Recursion should work:

def recurse_update_value(treenode):
    if not treenode.children:
        return
    for child in treenode.children:
        recurse_update_value(child)
    if all(x.value==treenode.children[0].value for x in treenode.children):
        treenode.value = treenode.children[0].value
        treenode.children = []

Method 2

Additionally, if you can edit the TreeNode class, you can set the getter method to automatically update the children.

class TreeNode:
    def __init__(self, path, value=None):
        self.path = path
        self._value = value
        self.children = []

    @property
    def value(self)
        if not self.children:
            return self._value
        first_child_value = self.children[0].value
        if all(x.value==first_child_value for x in self.children)
            self._value = first_child_value
            self.children = []
        return self._value

Then, you simply have to call topnode.value to update the tree.

Upvotes: 3

Related Questions