timaschew
timaschew

Reputation: 16602

How to calculate overall distances from lowest root(s) of a directed graph with networkx

If you look at this DAG (directed acyclic graph):

I want to create a dict which maps the distance from the lowest node(s) to all others nodes which is similar to the x position (height) of each node from the bottom in the rendered graph. For that given graph it would be:

distance_nodes_map: {
  0: {'base-zero', 'base-one'}, 
  1: {'low-b', 'low-a', 'low-c'}, 
  3: {'high-x', 'high-z', 'high-y'}, 
  2: {'mid-r', 'mid-q', 'mid-p'}, 
  4: {'super'}
}

I wrote an algorithm which worked for that graph above but then I've tested another graph and it didn't work anymore. I tried some algorithms and functions like shortest path or descendants_at_distance but I don't think they are really helpful as an input to calculate the distances.

My algorithm doesn't work for instance for this graph:

https://gist.github.com/timaschew/3b08a07243fa6f43773014ef5e705c96

Here is gist which contains:

Upvotes: 2

Views: 667

Answers (2)

Riccardo Bucco
Riccardo Bucco

Reputation: 15364

You are looking for an algorithm that draws a layered graph. There are many different algorithms, and you should choose the one that best fit your needs (for example, have a look at the following paper A Technique for Drawing Directed Graphs by Gansner et al.).

Many of those algorithms are already implemented in Graphviz (a very famous and powerful graph visualization software). Once you have installed it, it's pretty straightforward to compute the result you are looking for (G is your directed acyclic graph built using networkx.DiGraph):

from networkx.drawing.nx_agraph import graphviz_layout

def get_distance_nodes_map(G):
    pos = graphviz_layout(G, prog='dot')
    coor = sorted({y for k, (x, y) in pos.items()})
    kmap = dict(zip(coor, range(len(coor))))
    distance_nodes_map = {level: set() for level in kmap.values()}
    for k, (x, y) in pos.items():
        distance_nodes_map[kmap[y]].add(k)
    return distance_nodes_map

Here are a couple of examples using data that you provided:

>>> from networkx import DiGraph
>>> from pprint import PrettyPrinter
>>> pp = PrettyPrinter()
>>> G1 = DiGraph()
>>> G1.add_edges_from([('super', 'high-x'), ('high-x', 'mid-p'),
...                    ('mid-p', 'low-b'), ('mid-p', 'low-c'),
...                    ('low-c', 'base-zero'), ('low-c', 'base-one'),
...                    ('high-y', 'mid-p'), ('high-y', 'base-zero'),
...                    ('high-z', 'base-one'), ('high-z', 'mid-r'),
...                    ('high-z', 'mid-q'), ('mid-q', 'low-a'),
...                    ('low-a', 'base-one')])
>>> pp.pprint(get_distance_nodes_map(G1))
{0: {'base-one', 'base-zero'},
 1: {'low-a', 'low-b', 'low-c'},
 2: {'mid-p', 'mid-r', 'mid-q'},
 3: {'high-y', 'high-x', 'high-z'},
 4: {'super'}}
>>> G2 = DiGraph()
>>> G2.add_edges_from([('n10', 'n11'), ('n11', 'n12'), ('n12', 'n13'),
...                    ('n13', 'n14'), ('n20', 'n14'), ('n20', 'n21'),
...                    ('n21', 'n22'), ('n22', 'n23'), ('n30', 'n23'),
...                    ('n30', 'n31'), ('n31', 'n32')])
>>> pp.pprint(get_distance_nodes_map(G2))
{0: {'n32'},
 1: {'n31', 'n23'},
 2: {'n30', 'n22'},
 3: {'n21', 'n14'},
 4: {'n13', 'n20'},
 5: {'n12'},
 6: {'n11'},
 7: {'n10'}}

Upvotes: 2

Paul Brodersen
Paul Brodersen

Reputation: 13031

Untested pseudo-code because my lunch break is nearly over:

You have a multi-root tree, with one chosen principal root.

  1. For each root, create a subgraph consisting of all reachable nodes for that root.

  2. Starting with the principal root (root a), compute the distance/shortest path length to the root for all nodes in the corresponding subgraph A.

  3. Find all subgraphs that share at least one node with the principal subgraph, and select the subgraph (subgraph B) that has the node (node x) with the smallest distance to the principal root.

  4. Compute the distance to the root b for all nodes in subgraph B. Add the distance d(node x, root a). Subtract the distance d(node x, root b).

  5. Create the union of subgraph A and B. Repeat steps 3-5 until no roots remain.

  6. Subtract the maximum distance & reverse the sign such that principal root has the largest distance/order value.

Edit:

My pseudocode works (*). I blame user error. ;-)

#!/usr/bin/env python
"""
https://stackoverflow.com/q/66584661/
"""
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx


def hierarchical_layout(graph):

    longest_path = nx.algorithms.dag.dag_longest_path(graph)
    principal_root = longest_path[0]

    roots = [node for node, degree in list(graph.in_degree) if degree==0]
    subgraphs = {root : create_subgraph(graph, root) for root in roots}

    # Starting with the principal root (root a), compute the
    # longest path length to the root for all nodes in the
    # corresponding subgraph A.
    node_to_level = single_source_longest_dag_path_length(subgraphs[principal_root], principal_root)

    explored = subgraphs[principal_root]
    del subgraphs[principal_root]

    while len(explored) < len(graph):

        # Find all subgraphs that share at least one node with the
        # principal subgraph, and select the subgraph (subgraph B) that
        # has the node (node x) with the smallest distance to the
        # principal root.
        minimum_cost = np.inf
        minimum_cost_node = None
        minimum_cost_root = None
        for root, subgraph in subgraphs.items():
            for node in subgraph.nodes:
                if node in node_to_level:
                    if node_to_level[node] < minimum_cost:
                        minimum_cost = node_to_level[node]
                        minimum_cost_node = node
                        minimum_cost_root = root

        assert minimum_cost_node, "Could not find a connected subgraph."

        # Compute the distance to the root b for all nodes in subgraph
        # B. Add the distance d(node x, root a). Subtract the distance
        # d(node x, root b).
        path_lengths = [len(path) for path in nx.all_simple_paths(subgraphs[minimum_cost_root], minimum_cost_root, minimum_cost_node)]
        offset = np.max(path_lengths) - 1
        for node, distance in single_source_longest_dag_path_length(subgraphs[minimum_cost_root], minimum_cost_root).items():
            if not node in node_to_level:
                node_to_level[node] = distance + minimum_cost - offset

        # Create the union of subgraph A and B.
        explored = nx.compose(explored, subgraphs[minimum_cost_root])
        del subgraphs[minimum_cost_root]

    return node_to_level


def create_subgraph(G, node):
    # https://stackoverflow.com/a/45678930/2912349
    nodes = nx.single_source_shortest_path(G,node).keys()
    return G.subgraph(nodes)


def single_source_longest_dag_path_length(graph, s):
    # from AlaskaJoslin's comment to https://stackoverflow.com/a/60978007/2912349
    dist = dict.fromkeys(graph.nodes, -float('inf'))
    dist[s] = 0
    topo_order = nx.topological_sort(graph)
    for n in topo_order:
        for s in graph.successors(n):
            if dist[s] < dist[n] + 1:
                dist[s] = dist[n] + 1
    return dist


if __name__ == '__main__':

    # edge_list = [
    #     ("n10", "n11"),
    #     ("n11", "n12"),
    #     ("n12", "n13"),
    #     ("n13", "n14"),
    #     ("n20", "n21"),
    #     ("n20", "n14"),
    #     ("n21", "n22"),
    #     ("n22", "n23"),
    #     ("n30", "n23"),
    #     ("n30", "n31"),
    #     ("n31", "n32"),
    # ]

    edge_list = [
        ("low-a", "base-one"),
        ("low-c", "base-zero"),
        ("low-c", "base-one"),
        ("mid-p", "low-b"),
        ("mid-p", "low-c"),
        ("mid-q", "low-a"),
        ("high-x", "mid-p"),
        ("high-y", "mid-p"),
        ("high-y", "base-zero"),
        ("high-z", "mid-q"),
        ("high-z", "mid-r"),
        ("high-z", "base-one"),
        ("super", "high-x"),
    ]

    graph = nx.DiGraph()
    graph.add_edges_from(edge_list)

    node_to_level = hierarchical_layout(graph)

    # reverse output format
    distance_nodes_map = dict()
    max_distance = np.max(list(node_to_level.values()))
    for node, distance in node_to_level.items():
        reversed_distance = max_distance - distance
        if reversed_distance in distance_nodes_map:
            distance_nodes_map[reversed_distance].add(node)
        else:
            distance_nodes_map[reversed_distance] = set([node])

    # print(distance_nodes_map)
    for ii, nodes in sorted(distance_nodes_map.items())[::-1]:
        print(f"{ii} : {nodes}")

Yields:

    # 4 : {'super'}
    # 3 : {'high-x', 'high-y', 'high-z'}
    # 2 : {'mid-p', 'mid-r', 'mid-q'}
    # 1 : {'low-a', 'low-b', 'low-c'}
    # 0 : {'base-one', 'base-zero'}

(*) "subtract distance d(node x, root b)" naturally implied the longest path length between node x and root b. Obviously.

Upvotes: 1

Related Questions