Reputation: 33509
I want to prove that two data processing functions are functionally equivalent. To do this I built an abstract syntax tree for each function and then ran a simulation of the control flow to construct a data flow graph of the final output.
The data flow graph is represented by 3-tuples of (operation, operanda, operandb) such that (a+b)*2 would be represented as:
('*',('+','a','b'),'2')
Some of my operations are commutative, so the data flow from an equivalent graph might be:
('*','2',('+','b','a'))
How can I check that my 2 data flow graphs are isomorphic (i.e. perform exactly the same operations)?
My idea was to try and turn each data flow graph into a canonical form by detecting commutative operators and placing their operands into sorted order (e.g. lexicographical order although I don't think the order matters as long as I am consistent). I can then compare the reordered tuples for strict equality.
For my graphs I think this algorithm should be sufficient even though it would not spot equivalence between things like (a+b)+c and a+(b+c).
However, I ran into an efficiency problem even for small graphs.
For example, this Python code constructs a trivial data flow graph with just 28 operations, but it takes Python 8 seconds to compare the tuples (and larger graphs get exponentially worse):
from time import time
def make_dataflow_graph(n):
A='y'
for i in range(n):
A=('+',A,A)
return A
G1 = make_dataflow_graph(28)
G2 = make_dataflow_graph(28)
t0 = time()
print G1<G2
print time()-t0
I suppose the problem is that Python does the comparison recursively so it is wasting lots of time comparing the same nodes again and again.
Is there some Pythonic way to make a tuple comparison more efficient, or is there some better algorithm for comparing my data flow graphs?
Upvotes: 1
Views: 441
Reputation: 65498
I can't speak to whether it's idiomatic Python, but you could borrow an idea from Lisp called hash consing. In a sentence, the idea is to use a hash table to increase subobject sharing to the max, allowing tuples to be compared by identity instead of deeply.
Something like this:
canonical_ids = set()
canonical_objs = {}
canonical_tuples = {}
def canonical_object(obj):
if id(obj) in canonical_ids:
return obj
if isinstance(obj, tuple):
operator, left_operand, right_operand = map(canonical_object, obj)
if operator in {'+', '*'} and id(left_operand) > id(right_operand):
left_operand, right_operand = right_operand, left_operand
obj = operator, left_operand, right_operand
canon_obj = canonical_tuples.setdefault(tuple(map(id, obj)), obj)
else:
canon_obj = canonical_objs.setdefault(obj, obj)
canonical_ids.add(id(canon_obj))
return canon_obj
Then you can do things like
canonical_object(obj1) is canonical_object(obj2)
Upvotes: 2