O.rka
O.rka

Reputation: 30677

How to convert from NetworkX graph to ete3 Tree object?

I'm trying to figure out how to build an ete3.Tree object from a networkx directed graph? I added each child in the way I thought would produce the desired result but I am having trouble.

edges = [('lvl-1', 'lvl-2.1'), ('lvl-1', 'lvl-2.2'), ('lvl-2.1', 'lvl-3.1'), ('lvl-2.1', 2), ('lvl-2.2', 4), ('lvl-2.2', 6), ('lvl-3.1', 'lvl-4.1'), ('lvl-3.1', 5), ('lvl-4.1', 1), ('lvl-4.1', 3), ('input', 'lvl-1')]
graph = nx.OrderedDiGraph()
graph.add_edges_from(edges)
nx.draw(graph, pos=nx.nx_agraph.graphviz_layout(graph, prog="dot"), with_labels=True, node_size=1000, node_color="lightgray")

enter image description here

tree = ete3.Tree()
for parent, children in itertools.groupby(graph.edges(), lambda edge:edge[0]):
    subtree = ete3.Tree(name=parent)
    for child in children:
        subtree.add_child(name=child[1])
    tree.add_child(child=subtree, name=parent)
print(tree) 
#       /-lvl-2.1
#    /-|
#   |   \-lvl-2.2
#   |
#   |   /-lvl-3.1
#   |--|
#   |   \-2
#   |
#   |   /-4
#   |--|
# --|   \-6
#   |
#   |   /-lvl-4.1
#   |--|
#   |   \-5
#   |
#   |   /-1
#   |--|
#   |   \-3
#   |
#    \- /-lvl-1

I've also tried the following but it did not work:

tree = ete3.Tree()
for parent, child in graph.edges():
    if parent not in tree:
        tree.add_child(name=parent)
    subtree = tree.search_nodes(name=parent)[0]
    subtree.add_child(name=child)
print(tree)
#                /-1
#             /-|
#          /-|   \-3
#         |  |
#       /-|   \-5
#      |  |
#    /-|   \-2
#   |  |
#   |  |   /-4
# --|   \-|
#   |      \-6
#   |
#    \- /-lvl-1

Upvotes: 2

Views: 3148

Answers (2)

O.rka
O.rka

Reputation: 30677

# Graph
edges = [('lvl-1', 'lvl-2.1'), ('lvl-1', 'lvl-2.2'), ('lvl-2.1', 'lvl-3.1'), ('lvl-2.1', 2), ('lvl-2.2', 4), ('lvl-2.2', 6), ('lvl-3.1', 'lvl-4.1'), ('lvl-3.1', 5), ('lvl-4.1', 1), ('lvl-4.1', 3), ('input', 'lvl-1')]
G = nx.OrderedDiGraph()
G.add_edges_from(edges)

# Tree
root = "input"
subtrees = {node:ete3.Tree(name=node) for node in G.nodes()}
[*map(lambda edge:subtrees[edge[0]].add_child(subtrees[edge[1]]), G.edges())]
tree = subtrees[root]
print(tree.get_ascii())
#                                /-1
#                         /lvl-4.1
#                  /lvl-3.1      \-3
#                 |      |
#           /lvl-2.1      \-5
#          |      |
# -inputlvl-1      \-2
#          |
#          |       /-4
#           \lvl-2.2
#                  \-6

Upvotes: 3

Synedraacus
Synedraacus

Reputation: 1045

The subtrees and reading from networkX object are alright, the problem is that you're adding all subtrees to your original tree instance directly. In ete3, Tree class is in fact just a Node (including pointers to its descendants, if any), so tree.add_child adds new child nodes/subtrees directly to the root node.

What you should do instead is iterate over the leaves of ete tree, find the one where node.name == parent and attach all the children to it. Also, you should attach them one by one, not pre-generate a subtree. Otherwise you would get additional internal node with a single parent and a single child.

EDIT:

The second version of your code is almost right, but you fail to consider that nodes are never to be attached to the tree (ie root) if the root isn't their actual parent. That's probably why you get lvl-1 as a separate node which isn't a parent of other nodes. Also, I'm not sure about the networkX graph traversal order, which may be important. Safer (if uglier) version would look like this:

# Setting up a root node for lvl-1 to attach to
tree.add_child(name='input')
# A copy in a list, because you may not want to edit the original graph
edges = list(graph.edges)
while len(edges) > 0:
    for parent, child in edges:
        # check if this edge's parent is in the tree
        for leaf it tree.get_leaves(): 
            if leaf.name == parent:
                # if it is, add child and thus create an edge
                leaf.add_child(name=child)
            # Wouldn't want to add the same edge twice, would you?
            edges.remove((parent, child))
    # Now if there are edges still unplaced, try again.

There may be a couple of typos in there, and it's definitely super slow. Something around O(n**2) from edge count or worse, what with all the iteration and list removals. Probably there is a method to walk the graph from root to leaves, which wouldn't require a copy of the edge list (and will work in a single iteration). But it will eventually produce a correct tree.

Upvotes: 0

Related Questions