pjpj
pjpj

Reputation: 441

Weighted quick-union with path compression algorithm: time complexity analysis

I'm learning "Weighted quick-union with path compression" algorithm for an union/find structure. The Princeton edu site explained detailedly the algorithm. And here is the implementation in Java:

public class WQUPC {
  private int[] id;
  private int[] sz;
  public WQUPC(int N) {
    id = new int[N];
    sz = new int[N];
    for (int i = 0; i < N; i++) {
      id[i] = i;
      sz[i] = 1;
    }
  }

  int root(int i) {
    while (i != id[i]) {
      id[i] = id[id[i]];
      i = id[i];
    }
    return i;
  }

  boolean connected(int p, int q) { return root(p) == root(q); }

  void union(int p, int q) {
    int i = root(p);
    int j = root(q);
    if (sz[i] < sz[j]) {
      id[i] = j;
      sz[j] += sz[i];
    } else {
      id[j] = i;
      sz[i] += sz[j];
    }
  }
}

But just like the website mentions about its performance:

Theorem: Starting from an empty data structure, any sequence of M union and find operations on N objects takes O(N + M lg* N) time.

• Proof is very difficult.

• But the algorithm is still simple!

But I'm still curious about how comes the iterative logarithm lg*n. How is it derived? Can someone prove it or just explain it in an intuitive way?

Upvotes: 3

Views: 2107

Answers (2)

Ami Tavory
Ami Tavory

Reputation: 76316

To begin with, your question has a slight mistake: the complexity with just path compression is only O(m log(n)) (without the iterated log). See, for example, exercise 21-4.4 in Introduction To Algorithms. In fact, you block

    if (sz[i] < sz[j]) {
      id[i] = j;
      sz[j] += sz[i];
    } else {
      id[j] = i;
      sz[i] += sz[j];
    }

does union by rank.

With both union by rank and path compression, though, the expression you used can be proved easily (much more easily than the inverse Ackerman one). The proof is based on three points:

  1. On each leaf-root path, the rank of each node is increasing. This in fact relies on union by rank, BTW, and, with it, it is very easy to show.

  2. If the root of a tree has rank r, the tree has at least 2r nodes. This can be shown by induction.

Based on 2., it's possible to show

  1. The maximal number of nodes with rank r is at most n / 2r.

The rest of the proof now follows by a clever arrangement of the worst possible way the ranks could be organized, which still shows that not too many are bad. For further details, see the Wikipedia entry on this.

Upvotes: 5

Scott Hunter
Scott Hunter

Reputation: 49848

As I recall, the proof had to do with amortizing the cost of the path compression over a set of searches. Shallow searches are cheap, and don't incur the cost of much path compression; deep searches cost a lot for that search and the path compression, but the path compression makes subsequent searches much cheaper on average.

Upvotes: 1

Related Questions