Fastest way to find all paths in a directed acyclic graph (DAG) with multiple starting nodes?

Question

I am trying to find all paths in a directed acyclic graph (DAG) with multiple starting nodes. I already have the DAG represented by a list of lists, together with each level of nodes from starts to terminations:

dag = [[4, 6, 7], [5, 6, 7], [4, 5, 6], [4, 5, 7],
       [], [], [], [], [1, 3], [1, 2], [0, 2, 3]]
levels = [[8, 9, 10], [0, 1, 2, 3], [4, 5, 6, 7]]

The example DAG colored by levels looks like this:

Since I have 3 starting nodes [8, 9, 10], my first thought is performing 3 DFSs. Below is my code:

def get_all_paths(dag, sources):

    def dfs(data, path, paths=None):
        if paths is None:
            paths = []
        prev = path[-1]
        if data[prev]:
            for val in data[prev]:
                new_path = path + [val]
                paths = dfs(data, new_path, paths)
        else:
            paths += [path]
        return paths

    all_paths = []
    for i in sources:
        paths = dfs(data=dag, path=[i], paths=[])
        all_paths += paths

    return all_paths

all_paths = get_all_paths(dag, sources=levels[0])
print(all_paths)

Output:

[[8, 1, 5], [8, 1, 6], [8, 1, 7], [8, 3, 4], [8, 3, 5], [8, 3, 7], 
 [9, 1, 5], [9, 1, 6], [9, 1, 7], [9, 2, 4], [9, 2, 5], [9, 2, 6], 
 [10, 0, 4], [10, 0, 6], [10, 0, 7], [10, 2, 4], [10, 2, 5], 
 [10, 2, 6], [10, 3, 4], [10, 3, 5], [10, 3, 7]]

%timeit 24.6 µs ± 577 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

However, when the graph is large or there are many starting nodes, my code becomes very slow. As we all know, the DFS time complexity of finding all paths in a DAG G(V,E) is O(V+E). So my approach has the time complexity of O(m(V+E)), where m is the number of starting nodes. In my code, each node is visited m times, but I feel like it is possible to still visit each node only once, especially given the levels and I'm not utilizing it. Maybe a BFS will do, but I'm not sure how to write it. Any suggestions?

Edit

Here are some benchmarking of the answers

I have adjusted my BFS code based on @chepner's comment.

def get_all_paths(dag, sources):

    def dfs(data, path, paths=None):
        if paths is None:
            paths = []
        prev = path[-1]
        if data[prev]:
            for val in data[prev]:
                new_path = path + [val]
                paths = dfs(data, new_path, paths)
        else:
            paths.append(path[1:])

        return paths

    dag.append(sources)
    
    return dfs(data=dag, path=[len(dag)-1], paths=[])

%timeit get_all_paths(dag, sources=levels[0])

Output:

9.73 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Testing @aminrd's answer:

from collections import defaultdict

def aminrd(dag, levels):
    paths = defaultdict(list)
    levels_reversed = levels[::-1]

    for node in levels_reversed[0]:
        paths[node] = [[node]]

    for lvl in levels_reversed[1:]:
        for src in lvl:
            for dst in dag[src-1]:
                paths[src] += [[src] + p for p in paths[dst]]

    result = []
    for lvl_0 in levels[0]:
        result += paths[lvl_0]
        
%timeit aminrd(dag, levels)

Output

10.7 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

chepner · Accepted Answer

Look for paths in a single augmented graph. If your set of starting nodes is S, add a new starting node S0 and edges (S0, s) for each s in S. Once your single DFS completes, you can simply remove S0 from each path, leaving you with the paths in your original graph.

This may reduce some of the duplication created by running dfs multiple times, but won't address the unavoidable fact that your running time is proportional to the number of paths to be found.

Fastest way to find all paths in a directed acyclic graph (DAG) with multiple starting nodes?

Edit

Answers (2)

Related Questions