HammerMeetNail
HammerMeetNail

Reputation: 350

Testing Weighted Quick Union Algorithm?

I am taking Algorithms, Part I via Coursera and am looking to test the run time of the Quick Find, Quick Union, and Weighted Quick Union algorithms. The course is in Java, which I am unfamiliar with, so I have gone through and attempted to recreate the algorithms in Python, which I am more familiar with.

Now that I have everything implemented, I aim to test each function to verify run time/complexity. I had been thinking of using the timeit library, but that seems to be throwing incorrect results, e.g., Weighted Quick Union takes longer to complete than QuickUnion.

How can I verify that a Weighted Quick Union is in fact O(log n) and is faster than Quick Union? Here is what I have created and tried so far:

O(N**2) - Slow

class QuickFind_Eager:
    def __init__(self, nodes):
        self.array = [num for num in range(nodes)]

    # Joins two nodes into a component
    def union(self, first_node, second_node):
        for pos, val in enumerate(self.array):
            if self.array[pos] == self.array[first_node]:
                self.array[pos] = self.array[second_node]

    # Checks if two nodes are in the same component
    def connected(self, first_node, second_node):
        return self.array[first_node] == self.array[second_node]

O(N) - Still too slow - Avoid

class QuickUnion_Lazy:
    def __init__(self, nodes):
        self.array = [num for num in range(nodes)]

    # Follows parent pointers to actual root
    def root(self, parent):
        while parent != self.array[parent]:
            parent = self.array[parent]
        return parent

    # Joins two nodes into a component
    def union(self, first_node, second_node):
        self.array[first_node] = self.array[second_node]

    # Checks if two nodes are in the same component
    def connected(self, first_node, second_node):
        return self.root(first_node) == self.root(second_node)

O(log N) - Pretty darn fast

class WeightedQuickUnion:
    def __init__(self, nodes):
        self.array = [num for num in range(nodes)]
        self.weight = [num for num in range(nodes)]

    # Follows parent pointers to actual root
    def root(self, parent):
        while parent != self.array[parent]:
            parent = self.array[parent]
        return parent

    # Joins two nodes into a component
    def union(self, first_node, second_node):
        if self.root(first_node) == self.root(second_node):
            return

        if (self.weight[first_node] < self.weight[second_node]):
            self.array[first_node] = self.root(second_node)
            self.weight[second_node] += self.weight[first_node]
        else:
            self.array[second_node] = self.root(first_node)
            self.weight[first_node] += self.weight[second_node]

    # Checks if two nodes are in the same component
    def connected(self, first_node, second_node):
        return self.root(first_node) == self.root(second_node)

O(N + M lg* N) - wicked fast

class WeightedQuickUnion_PathCompression:
    def __init__(self, nodes):
        self.array = [num for num in range(nodes)]
        self.weight = [num for num in range(nodes)]

    # Follows parent pointers to actual root
    def root(self, parent):
        while parent != self.array[parent]:
            self.array[parent] = self.array[self.array[parent]]
            parent = self.array[parent]
        return parent

    # Joins two nodes into a component
    def union(self, first_node, second_node):
        if self.root(first_node) == self.root(second_node):
            return

        if self.weight[first_node] < self.weight[second_node]:
            self.array[first_node] = self.root(second_node)
            self.weight[second_node] += self.weight[first_node]
        else:
            self.array[second_node] = self.root(first_node)
            self.weight[first_node] += self.weight[second_node]

    # Checks if two nodes are in the same component
    def connected(self, first_node, second_node):
        return self.root(first_node) == self.root(second_node)

Test run time

def test_quickfind(quickfind):
    t = quickfind(100)
    t.union(1,2)
    t.connected(1,2)
    t.union(4,2)
    t.union(3,4)
    t.connected(0,2)
    t.connected(1,4)
    t.union(0,3)
    t.connected(0,4)

import timeit

t = timeit.timeit(stmt="test_quickfind(QuickFind_Eager)", setup="from __main__ import QuickFind_Eager; from __main__ import test_quickfind", number=100000)
print(t)
# 11.4380569069981
t = timeit.timeit(stmt="test_quickfind(QuickUnion_Lazy)", setup="from __main__ import QuickUnion_Lazy; from __main__ import test_quickfind", number=100000)
print(t)
# 1.4744456350017572
t = timeit.timeit(stmt="test_quickfind(WeightedQuickUnion)", setup="from __main__ import WeightedQuickUnion; from __main__ import test_quickfind", number=100000)
print(t)
# 2.738758583996969
t = timeit.timeit(stmt="test_quickfind(WeightedQuickUnion_PathCompression)", setup="from __main__ import WeightedQuickUnion_PathCompression; from __main__ import test_quickfind", number=100000)
print(t)
# 3.0113827050008695

Update Added results from timeit.

Upvotes: 4

Views: 2084

Answers (2)

SawlStone
SawlStone

Reputation: 1

Implementation of union function in QuickFindEager class is not correct. self.array[first_node] and self.array[second_node] should be added to variables before loop and than changed in loop from variables

def union(self, first_node, second_node):
    pid = self.array[first_node]
    qid = self.array[second_node]
    for pos, val in enumerate(self.array):
        if self.array[pos] == pid:
            self.array[pos] = qid

Upvotes: 0

collapsar
collapsar

Reputation: 17238

You need to tabulate the algorithms' running times as a function of problem size, ie. calling quickfind for different problem sizes ( say 100,200,300,400,500; beware, expect the latter to run at least 3 minutes for the naive O(n^2) algorithm ).

You still have no guarantees that you observe the asymptotic run time functions (that's what the O notation is about: O(f) actually describes a family of functions g_i, g_i = a_i * f(n) + b_i; a_i, b_i: const [sort of abusing the notation]), since some of your implementations may run into a resource exhaustion (read: no more ram) resulting in significant performance hits beyond the realm of your implementations.

Upvotes: 2

Related Questions