Reputation: 8491
I'm analyzing the control flow graph of a function which basically maps incoming data onto outgoing data. Most of the blocks are like this one:
if (input_variable == SPECIFIC_CONSTANT) {
output_variable = TRUE;
}
else {
output_variable = FALSE;
}
Typical Control Flow Graph for such code looks like the following graph
digraph G {
2 -> 3 -> 5;
2 -> 4 -> 5;
}
where execution of 3
and 4
are conditioned by the value of the input_variable
but 5
is independent.
Given a directed graph and a start node, how do I find the nearest node such that any path from the start node goes through this node?
Example: Given this graph how do I find 6
starting from 2
or 12
starting from 8
?
Can I reverse a Lowest Common Ancestor algorithm and would it be efficient? Like
for each node in graph:
ancestors = node.get_all_ancestors()
lca = find_lowest_common_ancestor(ancestors)
junction_node[lca] = node
get_junction_point(node):
return junction_node[node]
My programming language is Python and I just discovered NetworkX, but any algorithm would be appreciated. I am not accustomed to graph theory and I think I miss the basic glossary to find what I'm looking for.
Thanks for your help!
Upvotes: 9
Views: 681
Reputation: 8491
Reading at all your proposed solutions, I came up with an idea. I give my first node an amount of 1. Recursively, all children receive an equal fraction of this amount. In their turn, they dispatch this amount downwards. If a child receives a total of 1 (the start amount) then it is the "junction point". Here is my implementation (open for comments !!).
I hope the BFS-construct limitates the number of node visited.
class Node(object):
"""
Object representing a node in a graph where we search the junction node that
concentrates all paths starting from a start node.
``reference``: Attaches the real object that contains the relationships.
``initial_value``: Initial amount for the node. Typically 1 for the start
node, 0 for the others.
``path``: Set of already traversed nodes that reached the node. Used to
prune circular dependencies.
"""
def __init__(self, reference, initial_value, path=set()):
self.reference = reference
# See dispatch() for explaination
self.value_can_dispatch = self.value_has_received = initial_value
self.path = path
def receive(self, value):
"""
Give ``value`` to the node. If the node received 1 (or more for security)
in total, it will return True. Else it returns False.
"""
self.value_has_received += value
self.value_can_dispatch += value
if self.value_has_received >= 1.:
return True
return False
def dispatch(self, children):
"""
Dispatch the value received to the children.
Returns a filtered list of ``children`` where children involved in a
circular dependency are removed.
If one child signals that it has received a total of 1, the function will
stop and return this one child.
"""
# Filter successors that are in the path used to access this node so to cut
# out cycles
true_successors = [child for child in children if child not in self.path]
# Cut the received value into equal pieces
amount = self.value_can_dispatch/len(true_successors)
# We transmit only the value received after the last time it was dispatched
# because paths may lead to the same node at different iterations (path
# through one node may be longer than through another) and thus the same
# node can be asked to dispatch to its children more than once.
# The total amount of received value is kept in value_has_received because
# the node may receive the complete amount in several steps. Thus, part of
# its value may have been dispatched before we notice that the node received
# the total amount of 1.
self.value_can_dispatch = Fraction(0)
for child in true_successors:
# If child signaled that he received 1, then raise the winner
if child.receive(amount):
return child
return set(true_successors)
def touch(self, other_node):
"""
"Touches" a node with another, notifying that the node is reachable
through another path than the known ones.
It adds the elements of the new path as ancestors of the node.
"""
self.path |= other_node.path | {other_node}
def make_child(self, reference):
"""
Creates a child of the node, pointing to reference. The child receives
its path from the current node.
"""
# This is were the algorithm can go mad. If child is accessed through two
# paths, the algorithm will only protect recursion into the first
# path. If the successors recurse into the second path, we will not detect
# it. => We should update the child's path when a second path reaches it.
return self.__class__(reference, Fraction(0), self.path | {self})
def __repr__(self):
return "<{} {}>".format(self.__class__.__name__, self.reference)
def find_junction_node(first_reference, get_uid, get_successors, max_iterations=100):
"""
Find the junction node of all paths starting from ``first_reference`` in
a directed graph. ``get_uid`` is a function accepting a reference to a node
in your graph and returning a unique identifier for this reference.
``get_successors`` is a function accepting a reference to a node in your
graph. It should return a list of references to all its the children nodes.
It may return None if the node has no child.
``max_iterations`` limits the number of pass the algorithm use to find the
junction node. If reached, the funciton raises a RuntimeError.
Returns ``(jp, ln)`` where ``jp`` is a reference to a node in your graph
which is the junction node and ``ln`` the list of nodes in the subgraph
between the start node and the junction node.
"""
# Mapping to already created nodes
nodes = {}
# Initialise first node with an amount of 1
node = Node(first_reference, Fraction(1, 1))
nodes[get_uid(first_reference)] = node
# Initialise first iteration of DFS
successors = set()
successors.add(node)
# Max iteration provides security as I'm not sure the algorithm cannot loop
for i in range(max_iterations):
next_successors = set()
# Process one level of nodes
for node in successors:
# Find successors in data graph
sub_references = get_successors(node.reference)
# This happens when we reach the end of the graph, node has no children
if sub_references is None:
continue
# Make a list of Node that are children of node
children = set()
for reference in sub_references:
uid = get_uid(reference)
# Does it exist?
child = nodes.get(uid, None)
if not child:
child = node.make_child(reference)
nodes[uid] = child
else:
child.touch(node)
children.add(child)
# Dispatch the value of node equally between its children
result = node.dispatch(children)
#print("Children of {}: {!r}".format(node, result)) # DEBUG
# If one child received a total of 1 from its parents, it is common to
# all paths
if isinstance(result, Node):
return result.reference, [node.reference for node in result.path]
# Else, add the filtered list of children to the set of node to process
# in the next level
else:
next_successors |= result
successors = next_successors
# Reached end of graph by all paths without finding a junction point
if len(successors) == 0:
return None
raise RuntimeError("Max iteration reached")
Upvotes: 0
Reputation: 114035
Not the most efficient solution, but here's something that should get you started:
Do a DFS, then compute the intersection of all paths (nodes that exist in every path). Then, among those nodes, find the one that appears closest to the beginning in every path:
>>> paths
[]
>>> def dfs(G, s, path):
... if s not in G or not G[s]:
... paths.append(path)
... else:
... for t in G[s]:
... dfs({k:[_t for _t in v if _t!=t] for k,v in G.items()}, t, path+[t])
...
>>> dfs(G, 2, [])
>>> paths
[[3, 4, 6, 7, 12], [3, 4, 6, 8, 9, 10, 12], [3, 4, 6, 8, 9, 12], [3, 4, 6, 8, 11, 12], [3, 5, 6, 7, 12], [3, 5, 6, 8, 9, 10, 12], [3, 5, 6, 8, 9, 12], [3, 5, 6, 8, 11, 12], [4, 6, 7, 12], [4, 6, 8, 9, 10, 12], [4, 6, 8, 9, 12], [4, 6, 8, 11, 12]]
>>> for path in paths: print(path)
...
[3, 4, 6, 7, 12]
[3, 4, 6, 8, 9, 10, 12]
[3, 4, 6, 8, 9, 12]
[3, 4, 6, 8, 11, 12]
[3, 5, 6, 7, 12]
[3, 5, 6, 8, 9, 10, 12]
[3, 5, 6, 8, 9, 12]
[3, 5, 6, 8, 11, 12]
[4, 6, 7, 12]
[4, 6, 8, 9, 10, 12]
[4, 6, 8, 9, 12]
[4, 6, 8, 11, 12]
>>> nodes = [set(L) for L in paths]
>>> commons = functools.reduce(set.intersection, nodes)
>>> commons
{12, 6}
>>> min(commons, key=lambda v: max(L.index(v) for L in paths))
6
Now, note how 6
shows up at index 2 in some paths and at index 1 in some other paths. If there was a node (say x), that showed up at index 1 in the paths where 6 shows up at index 2, and at index 2 where 6 shows up at index 1, then, that would be a tie, which this algorithm would break arbitrarily. Depending on your needs, you might want to define how to handle this case better
Upvotes: 2
Reputation: 7817
You can do something like this:
For each node find the list of all its ancestors and descendants. If Size(ancestors) + size(descendants) + 1 is equal to network size, it is a candidate. Now, find such a node with at at least one ancestor and maximum number of descendants.
The list of ancestors and descendants can be calculated pretty easily. Let me know if you are not sure and I will expand my answer.
Upvotes: 0