muruDiaz
muruDiaz

Reputation: 127

How to create a DAG from a list in python

I am using networkx to manually input the (u, v, weights) to a graph. But when the input gets bigger this manual insertion of nodes and edges will become a really tiresome task and prone to errors. I'm trying but haven't figured out that how to perform this task without manual labour.

Sample Input:

my_list = ["s1[0]", "d1[0, 2]", "s2[0]", "d2[1, 3]", "d3[0, 2]", "d4[1, 4]", "d5[2, 3]", "d6[1, 4]"]

Manual Insertion:

Before inserting nodes into a graph I need to number them, so first occurrence of 's' or 'd' can be differentiate from later similar characters e.g. s1,s2,s3,... and d1,d2,d3,... I am aware it is something similar to SSA form (compilers) but I was not able to find something helpful for my case.

Manually inserting (u, v, weights) to a DiGraph()

my_graph.add_weighted_edges_from([("s1", "d1", 0), ("d1", "s2", 0), ("d1", "d3", 2), ("s2", "d3", 0), (
    "d2", "d4", 1), ("d2", "d5", 3), ("d3", "d5", 2), ("d4", "d6", 1), ("d4", "d6", 4)])

Question:

How to automatically convert that input list(my_list) into a DAG(my_graph), avoiding manual insertion?

Complete Code: This is what I have written so far.

import networkx as nx
from networkx.drawing.nx_agraph import write_dot, graphviz_layout
from matplotlib import pyplot as plt

my_graph = nx.DiGraph()
my_graph.add_weighted_edges_from([("s1", "d1", 0), ("d1", "s2", 0), ("d1", "d3", 2), ("s2", "d3", 0), (
    "d2", "d4", 1), ("d2", "d5", 3), ("d3", "d5", 2), ("d4", "d6", 1), ("d4", "d6", 4)])


write_dot(my_graph, "graph.dot")

plt.title("draw graph")
pos = graphviz_layout(my_graph, prog='dot')



nx.draw(my_graph, pos, with_labels=True, arrows=True)

plt.show()
plt.clf()

Explanation:

  1. 's' and 'd' are some instructions that requires 1 or 2 registers respectively, to perform an operation.

  2. In above example we have 2 's' operations and 6 'd' operations and there are five registers [0,1,2,3,4].

  3. Each operation will perform some calculation and store the results in relevant register/s.

  4. From input we can see that d1 uses register 0 and 2, so it cannot operate until both of these registers are free. Therefore, d1 is dependent on s1 because s1 comes before d1 and is using register 0. As soon as s1 finishes d1 can operate as register 2 is already free.

  5. E.g. We initialize all registers with 1. s1 doubles its input while d1 sums two inputs and store the result in it's second register:

so after s1[0] reg-0 * 2 -> 1 * 2 => reg-0 = 2

and after d1[0, 2] reg-0 + reg-2 -> 2 + 1 => reg-0 = 2 and reg-2 = 3

Update 1: The graph will be a dependency-graph based on some resources [0...4], each node will require 1(for 's') or 2(for 'd') of these resources.

Update 2: Two questions were causing confusion so I'm separating them. For now I have changed my input list and there is only a single task of converting that list into a DAG. I have also included an explanation section.

PS: You might need to pip install graphviz if you don't already have it.

Upvotes: 0

Views: 2001

Answers (1)

Tadhg McDonald-Jensen
Tadhg McDonald-Jensen

Reputation: 21453

Ok now that I have a better idea of how the mapping works, it just comes down to describing the process in code, keeping a mapping of which op is using which resource and as iterating over the operations if it uses a resource used by the previous operation we generate an edge. I think this is along the lines of what you are looking for:

import ast
class UniqueIdGenerator:
    def __init__(self, initial=1):
        self.auto_indexing = {}
        self.initial = initial
    def get_unique_name(self, name):
        "adds number after given string to ensure uniqueness."
        if name not in self.auto_indexing:
            self.auto_indexing[name] = self.initial
        unique_idx = self.auto_indexing[name]
        self.auto_indexing[name] += 1
        return f"{name}{unique_idx}"

def generate_DAG(source):
    """
    takes iterable of tuples in format (name, list_of_resources) where
    - name doesn't have to be unique
    - list_of_resources is a list of resources in any hashable format (list of numbers or strings is typical)
    
    generates edges in the format (name1, name2, resource),
    - name1 and name2 are unique-ified versions of names in input
    - resource is the value in the list of resources
    each "edge" represents a handoff of resource, so name1 and name2 use the same resource sequentially.
    """
    # format {resource: name} for each resource in use.
    resources = {}
    g = UniqueIdGenerator()
    for (op, deps) in source:
        op = g.get_unique_name(op)
        for resource in deps:
            # for each resource this operation requires, if a previous operation used it
            if resource in resources:
                # yield the new edge
                yield (resources[resource], op, resource)
            # either first or yielded an edge, this op is now using the resource.
            resources[resource] = op

my_list = ["s[0]", "d[0, 2]", "s[0]", "d[1, 3]", "d[0, 2]", "d[1, 4]", "d[2, 3]", "d[1, 4]"]
data = generate_DAG((a[0], ast.literal_eval(a[1:])) for a in my_list)
print(*data, sep="\n")

Upvotes: 1

Related Questions