Reputation: 81862
We have a rule that disallows cyclic dependencies between packages.
We also have a rather huge package that needs some splitting.
The question is: How can I identify a/all/maximum subset of classes, that can be extracted from the package into a new package without creating a cyclic dependency.
Is there a well known algorithm for that?
A variation would be awesome in which on can define a maximum number of dependencies that can be ignored by the algorithm.
Rather obviously the subset(s) should be not identical to the package, nor empty. In case of a maximum subset it should be smaller than one half of the original package.
Upvotes: 3
Views: 500
Reputation: 726
Basically, your classes, objects, or what have you, are stored in a matrix (called adjacency matrix) that represents a directed graph (with or without cycles). See the graph below and the corresponding adjacency matrix.
From this, we can calculate the reachability matrix, which describes to which nodes can one travel from the current node. For this graph, the reachability matrix is
You need an algorithm that rearranges the rows and the columns of the matrix, so that all non-zero elements are below the main diagonal. A sequence of object indexes for which this is true can be executed in the order in which they appear in the matrix, and all necessary dependencies for each object would be satisfied. If the graph is known to be acyclic, this can be achieved by topological sorting.
When cycles appear in the directed graph, you won't be able to find an ordering for which this is true.
Enter Design/Dependency Structure Matrix (DSM). A so called partitioning algorithm can be implemented to divide the objects into levels. For each of those levels, the objects can be executed in arbitrary order, and are not dependent one or another. For the graph above, nodes 3, 4 and 5 are not dependent on each other and can be executed in any order.
A partitioning algorithm has been developed in (Warfield 1973), which is able to detect and isolate cycles in the DSM. This is similar to the topological sorting algorithm, but with usage of the reachability matrix to detect and isolate cycles.
The algorithm briefly:
- Create a new partition level
- Calculate the reachability and the antecedent sets R(s) and A(s)
- For each element in the DSM, calculate the set product R(s)A(s)
- If R(s)A(s)=R(s), then add the element
s
to the current level- Remove element
s
from the list, and all references to it from the reachability and antecedent sets of all other elements.- Repeat from 1 if the item list is not empty.
The antecedent set A(s) is the set of row indices of non-zero elements in column s
, while the reachability set R(s) is the set of column indices of the non-zero elements of s
.
Finally, some pseudocode (in VB.NET, no less):
CalculateInitialAntecedentSets()
CalculateInitialReachabilitySets()
While UnlabelledItems > 0
Sequence.AddNewPartitionLevel()
For Each s In ReachabilityMatrix
If NoDependencies(s) and AlreadyConsidered(s) Then
AddToLevel(CurrentLevel, s)
End If
Next
RemoveDependencies(ReachabilitySets, Sequence.Level(CurrentLevel))
RemoveDependencies(AntecedentSets, Sequence.Level(CurrentLevel))
UpdateConsideredList(Sequence.Level(CurrentLevel))
Unlabelled = Unlabelled - Sequence.Level(CurrentLevel).Count
CurrentLevel = CurrentLevel + 1
End While
(This was the topic of my Master thesis some years ago)
Warfield, John N. (1973), `Binary matrices in system modelling', IEEE Transactions on Systems, Man, and Cybernetics SMC-3(5), 441--449.
Upvotes: 3
Reputation: 15997
The algorithm you're looking for is topological sorting. Simply extract items until you encounter a cycle.
Upvotes: 0
Reputation: 16256
Just an idea:
Build a directed graph where your classes are nodes and dependencies are edges. Detect all strongly connected components. Calculate their weight (= number of nodes/classes). Now you have balanced partition problem - to partition a set of component weights into two subsets with minimal differences between their sums.
Upvotes: 1