Reputation: 2785
I am trying to find all paths in a directed acyclic graph (DAG) with multiple starting nodes. I already have the DAG represented by a list of lists, together with each level of nodes from starts to terminations:
dag = [[4, 6, 7], [5, 6, 7], [4, 5, 6], [4, 5, 7],
[], [], [], [], [1, 3], [1, 2], [0, 2, 3]]
levels = [[8, 9, 10], [0, 1, 2, 3], [4, 5, 6, 7]]
The example DAG colored by levels looks like this:
Since I have 3 starting nodes [8, 9, 10]
, my first thought is performing 3 DFSs. Below is my code:
def get_all_paths(dag, sources):
def dfs(data, path, paths=None):
if paths is None:
paths = []
prev = path[-1]
if data[prev]:
for val in data[prev]:
new_path = path + [val]
paths = dfs(data, new_path, paths)
else:
paths += [path]
return paths
all_paths = []
for i in sources:
paths = dfs(data=dag, path=[i], paths=[])
all_paths += paths
return all_paths
all_paths = get_all_paths(dag, sources=levels[0])
print(all_paths)
Output:
[[8, 1, 5], [8, 1, 6], [8, 1, 7], [8, 3, 4], [8, 3, 5], [8, 3, 7],
[9, 1, 5], [9, 1, 6], [9, 1, 7], [9, 2, 4], [9, 2, 5], [9, 2, 6],
[10, 0, 4], [10, 0, 6], [10, 0, 7], [10, 2, 4], [10, 2, 5],
[10, 2, 6], [10, 3, 4], [10, 3, 5], [10, 3, 7]]
%timeit 24.6 µs ± 577 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
However, when the graph is large or there are many starting nodes, my code becomes very slow. As we all know, the DFS time complexity of finding all paths in a DAG G(V,E)
is O(V+E). So my approach has the time complexity of O(m(V+E)), where m is the number of starting nodes. In my code, each node is visited m times, but I feel like it is possible to still visit each node only once, especially given the levels
and I'm not utilizing it. Maybe a BFS will do, but I'm not sure how to write it. Any suggestions?
Here are some benchmarking of the answers
I have adjusted my BFS code based on @chepner's comment.
def get_all_paths(dag, sources):
def dfs(data, path, paths=None):
if paths is None:
paths = []
prev = path[-1]
if data[prev]:
for val in data[prev]:
new_path = path + [val]
paths = dfs(data, new_path, paths)
else:
paths.append(path[1:])
return paths
dag.append(sources)
return dfs(data=dag, path=[len(dag)-1], paths=[])
%timeit get_all_paths(dag, sources=levels[0])
Output:
9.73 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Testing @aminrd's answer:
from collections import defaultdict
def aminrd(dag, levels):
paths = defaultdict(list)
levels_reversed = levels[::-1]
for node in levels_reversed[0]:
paths[node] = [[node]]
for lvl in levels_reversed[1:]:
for src in lvl:
for dst in dag[src-1]:
paths[src] += [[src] + p for p in paths[dst]]
result = []
for lvl_0 in levels[0]:
result += paths[lvl_0]
%timeit aminrd(dag, levels)
Output
10.7 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Upvotes: 2
Views: 1744
Reputation: 5010
By starting from the nodes in the last level, you can create a hash so that you build from the leaves first and then comping up level by level. I also guess the dag
that you have in the question are not exactly the same as the figure you posted:
from collections import defaultdict
dag = [[5, 4, 6, 7], [5, 6, 7], [4, 5, 6], [4, 5, 7],
[], [], [], [1, 3], [1, 2], [0, 2, 3]]
levels = [[8, 9, 10], [0, 1, 2, 3], [4, 5, 6, 7]]
paths = defaultdict(list)
levels_reversed = levels[::-1]
for node in levels_reversed[0]:
paths[node] = [[node]]
for lvl in levels_reversed[1:]:
for src in lvl:
for dst in dag[src-1]:
paths[src] += [[src] + p for p in paths[dst]]
result = []
for lvl_0 in levels[0]:
result += paths[lvl_0]
Result:
[[8, 1, 5], [8, 1, 4], [8, 1, 6], [8, 1, 7],
[8, 3, 4], [8, 3, 5], [8, 3, 6], [9, 1, 5],
[9, 1, 4], [9, 1, 6], [9, 1, 7], [9, 2, 5],
[9, 2, 6], [9, 2, 7], [10, 2, 5], [10, 2, 6],
[10, 2, 7], [10, 3, 4], [10, 3, 5], [10, 3, 6]]
Upvotes: 2
Reputation: 531205
Look for paths in a single augmented graph. If your set of starting nodes is S
, add a new starting node S0
and edges (S0, s)
for each s
in S
. Once your single DFS completes, you can simply remove S0
from each path, leaving you with the paths in your original graph.
This may reduce some of the duplication created by running dfs
multiple times, but won't address the unavoidable fact that your running time is proportional to the number of paths to be found.
Upvotes: 3