Reputation: 38
I want to find the connected components in an undirected graph. However, I don't have an adjacency matrix. Instead I have a set of vertices as well as a function telling me whether two vertices are adjacent. What is the most efficient way to find all connected components?
I know I could just calculate the entire adjacency matrix and use depth-first search to find all components. But that would not be very efficient as I'd need to check every pair of vertices.
What I'm currently doing is the following procedure:
This is the pseudocode:
connected_components(vertices):
// vertices belonging to the current component and whose neighbors have not yet been added
vertices_to_check= [vertices.pop()]
// vertices belonging to the current component
current_component = []
components = []
while (vertices.notEmpty() or vertices_to_check.notEmpty()):
if (vertices_to_check.isEmpty()): // All vertices in the current component have been found
components.add(current_component)
current_component = []
vertices_to_check= [vertices.pop()]
next_vertex = vertices_to_check.pop()
current_component.add(next_vertex)
for vertex in vertices: // Find all neighbors of next_vertex
if (vertices_adjacent(vertex, next_vertex)):
vertices.remove(vertex)
vertices_to_check.add(vertex)
components.add(current_component)
return components
I understand that this method is faster than calculating the adjacency matrix in most cases, as I don't need to check whether two vertices are adjacent, if it is already known they belong to the same component. But is there a way to improve this algorithm?
Upvotes: 0
Views: 2028
Reputation: 183544
Ultimately, any algorithm will have to call vertices_adjacent
for every single pair of vertices that turn out to belong to separate components, because otherwise it will never be able to verify that there's no link between those components.
Now, if a majority of vertices all belong to a single component, then there may not be too many such pairs; but unless you expect a majority of vertices all belong to a single component, there's little point optimizing specifically for that case. So, discarding that case, the very best-case scenario is:
vertices_adjacent
for each of those pairs.. . . which still involves making ¼|V|2 + |V| − 2 calls to vertices_adjacent
. By comparison, the build-an-adjacency-list approach makes ½|V|2 − ½|V| calls — which is more than the best-case scenario, but by a factor of less than 2. (And the worst-case scenario is simply equivalent to the build-an-adjacency-list approach. That would happen if no component contains more than two vertices, or if the graph is acyclic and you get unlikely in your choice of edges to check for first. Most graphs will be somewhere in between.)
So it's probably not worth trying to optimize too closely for the exact minimum number of calls to vertices_adjacent
.
That said, your approach seems pretty reasonable to me; it doesn't make any calls to vertices_adjacent
that are clearly unnecessary, so the only improvement would be a probabilistic one, if it could do a better job guessing which calls will turn out to be useful for eliminating later calls.
One possibility: in many graphs, there are some vertices that have a lot of neighbors and some vertices that have relatively few, according to a power-law distribution. So if you prioritize vertices based on how many neighbors they're already known to have, you may be able to take advantage of that pattern. (I think this is especially likely to be useful if the majority of vertices really do all belong to a single component, which is the only case where a better-than-factor-of-2 improvement is even conceivable.) But, you'll have to test to see if it actually makes a difference for the graphs you're interested in.
Upvotes: 3