Why is BFS O(n+m)?

enter image description here

The first while loop needs to visit all nodes in the worse case. This is n. For each visiting node, it needs to check all adjacent nodes/edges. I think this should be O(n*maximum deg(u)). Why are all answers found in google say that you just need to visit each node and edge once so it's O(n+m)? Visiting adjacent nodes/edges would visit repeat/visited nodes. You just don't add them to the list if they're visited. I think this still has a runtime. For example: a->b->c We start from a, check adjacent node b, add c to the list. Then go to b. Then from b to check a and c, a is visited, so add c to the list, then go to c. When exploring from b, we need O(2)/O(deg(b).

Upvotes: 2

Answers (2)

Abhinav Mathur

Reputation: 8101

Think of it this way:

Having 2 nested loops doesn't necessarily mean that their complexities will be multiplied
The outer loop iterates through all nodes, so that's O(n) complexity. No node is added more than once to the queue.
For a given node, we iterate through all it's neighbours. For any node, number of neighbours = degree of the node. Since we effectively iterate through all neighbours for each node once, this has a total complexity of O(sum of all node's degree). Since the sum of all degrees in a graph is 2*m where m is the number of edges, complexity = O(2*m) = O(m).

Total complexity = O(n+m)

Upvotes: 5

templatetypedef

Reputation: 372814

The reasoning you're following goes like this:

The outer loop in BFS runs O(n) times.
In the worst case, the inner loop in BFS runs O(maxdegree) times.
Therefore, the total work done is O(n * maxdegree).

This is a sound line of reasoning to follow, but it's an overestimate of the total amount of work done because you're assuming that, each time we process a node, we always do the maximum possible amount of work to process that node. However, that isn't necessarily the case. For example, consider a star graph, where you have a single central node with k other nodes adjacent just to it, like this:

       *
     * | *
      \|/
    *--*--*
      /|\
     * | *
       *

Here, the maximum degree of any one node is 8, but only a single node has that degree. All the other nodes have degree one, so assuming that on each iteration when we process a node we'll look at all eight neighbors will overcount the amount of work done.

There are two ways you can adjust the analysis to get the runtime of O(n + m). The first one is the "classical" way to do this. To analyze the inner loop, instead of multiplying its number of iterations in the worst case by the total number of iterations of the outer loop, let's add up how much work the inner loop does for the first node, plus the work it does for the second node, etc. Across all the nodes, that inner loop will visit each edge exactly once (for a directed graph) or twice (for an undirected graph). Therefore, across all iterations of the outer loop, the inner loop does O(m) work to visit all the nodes. That, plus the O(1) remaining work from the outer loop run O(n) times, gives you the O(m + n) runtime.

The other way to do this is to reason about the average work done per node in the inner loop, rather than the maximum work done per node in the inner loop. On average, each node is adjacent to m / n other nodes, since there are m edges and n nodes to distribute them across. Therefore, one iteration of the outer loop does O(1 + m / n) work, on average, and since there are n iterations of the outer loop the total work done is O(n + m).

Upvotes: 0

Why is BFS O(n+m)?

Answers (2)

Related Questions