Problem removing from Java-HashSet while implementing Kruskal's algorithm

in the following code I am trying to implement Kruskal's algorithm to compute the minimal spanning tree of a graph.

The problem is that removing sets from the connected components does not work properly and that the case if(!startSet.equals(endSet)) always seems to be executed.

Now the code:

    import java.util.ArrayList;
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Set;

import org.jgrapht.Graph;
import org.jgrapht.graph.AsSubgraph;
import org.jgrapht.graph.DefaultWeightedEdge;
import java.util.PriorityQueue;

public class mySpanningTree {
    
    //Receives a graph and computes a minimum spanning tree
    public static AsSubgraph<Integer, DefaultWeightedEdge> computeMST(Graph<Integer, DefaultWeightedEdge> graph) {
        AsSubgraph<Integer, DefaultWeightedEdge> tree = new AsSubgraph<Integer, DefaultWeightedEdge>(graph, graph.vertexSet(), new HashSet<DefaultWeightedEdge>());

        PriorityQueue<DefaultWeightedEdge> edgeQueue = new PriorityQueue<>(Comparator.comparingDouble(graph::getEdgeWeight));
        edgeQueue.addAll(graph.edgeSet());
        Set<Set<Integer>> connectedComponents = new HashSet<>();
        for(Integer i : graph.vertexSet()){
            Set<Integer> set = new HashSet<>();
            set.add(i);
            connectedComponents.add(set);
        }
        int n = tree.vertexSet().size() - 1;
        while (!edgeQueue.isEmpty() && tree.edgeSet().size() < n) {
            DefaultWeightedEdge edge = edgeQueue.poll();
            Integer start = graph.getEdgeSource(edge);
            Integer end = graph.getEdgeTarget(edge);
            Set<Integer> startSet =  null;
            Set<Integer> endSet = null;
            for(Set<Integer> set: connectedComponents){
                if(set.contains(start)){
                    startSet = set;
                }
                if(set.contains(end)){
                    endSet = set;
                }
            }
            if(!startSet.equals(endSet)){
                startSet.addAll(endSet);
                connectedComponents.remove(endSet);
                tree.addEdge(start, end, edge);
            }
            else{
            }
        }
        return tree;
    }

The idea of the code is to keep track of the connected components in a set. Whenever an edge connects two nodes from different connected components, I want to union the two connected components and store the result inside the set instead of the two components from before. Furthermore, the edge shall be added. Whenever an edge has two endpoints in one and the same set, it shall not be added (since adding it would create a cycle).

Any help is appreciated!

Upvotes: 0

Answers (3)

Dirk

Reputation: 31053

The value of connectedComponents is a mutable set, whose elements are also mutable sets. The problem is, that by destructively modifying the elements of connectedComponents without "notifying" the container, the container's data structures are silently corrupted.

Hash sets are managed data structures, that use an integer value (the hashCode) derived from the contents of their elements to guide where they put the object internally in their private data structures. This mechanism gives fast insertion and retrieval (usually in the order of O(1)). However, for this to work, the hashCode must not change while the object is present in the container.

In your code, the container is a HashSet<HashSet<Integer>> and the elements are HashSet<Integer> instances. For Java's sets, the hash code is computed based on the what the set currently contains. So, when you do

Set<Integer> set = new HashSet<>();
set.add(i);
connectedComponents.add(set);

the connectedComponents container computes a hash code from set and uses that to determine a place to put the object into in its internal data structures. However, when you do

startSet.addAll(endSet);

you mutate the contents of startSet which in turn causes its hash code to change. From connectedComponent's perspective, the object is now "in the wrong place" in its internal data structures, and hence it is likely, that it will not be found again. The important thing here is, that data structure like HashSet are based on equals and hashCode, and for many data structures (including HashSet) those are based on the actual contents, not merely on object identity.

To address this problem, try removing the element set from connectedComponents before mutating it, and then add it back afterward:

if(!startSet.equals(endSet)){
    connectedComponents.remove(startSet);  // Added back below ...
    connectedComponents.remove(endSet);
    startSet.addAll(endSet);
    connectedComponents.add(startSet);     // ... right here
    tree.addEdge(start, end, edge);
}

Alternatively, it might help to keep track of the element sets using a different container, like an IdentityHashMap (use whatever you like as the value). This would work here, since you do not need the container to track the element sets by their content (as HashSet implicitly does).

For example:

IdentityHashMap<Set<Integer>, Object> connectedComponents = new IdentityHashMap<>();

...

for(Integer i : graph.vertexSet()){
    Set<Integer> set = new HashSet<>();
    set.add(i);
    connectedComponents.put(set, set);
}

...

for(Set<Integer> set: connectedComponents.keySet()) {
    ...
    if(set.contains(start)){
        startSet = set;
    }
    if(set.contains(end)){
        endSet = set;
    }
}

if(!startSet.equals(endSet)) {
    connectedComponents.remove(endSet);
    startSet.addAll(endSet);
    tree.addEdge(start, end, edge);
}

I am not sure right now, whether it should remain !startSet.equals(endSet) or whether you might want to use startSet != endSet with the modified code, though. The former version uses "deep equality" for the test (i.e., considers the actual contents the the sets involved) whereas the latter version works by object identity only, which I think would be more appropriate if you consider the IdentityHashMap route.

Upvotes: 1

Alex Veleshko

Reputation: 1242

A typical implementation of Kruskal's algorithm uses disjoint-set data structure for this. With this data structure your code simplifies to the following:

// Receives a graph and computes a minimum spanning tree
public static AsSubgraph<Integer, DefaultWeightedEdge> computeMST(Graph<Integer, DefaultWeightedEdge> graph) {
    var tree = new AsSubgraph<>(graph, graph.vertexSet(), new HashSet<>());
    var edgeQueue = new PriorityQueue<>(Comparator.comparingDouble(graph::getEdgeWeight));
    edgeQueue.addAll(graph.edgeSet());
    DisjointSet connectedComponents = new DisjointSet(graph.vertexSet());
    int n = tree.vertexSet().size() - 1;
    
    while (!edgeQueue.isEmpty() && tree.edgeSet().size() < n) {
        DefaultWeightedEdge edge = edgeQueue.poll();
        Integer start = graph.getEdgeSource(edge);
        Integer end = graph.getEdgeTarget(edge);

        if (connectedComponents.findSet(start) != connectedComponents.findSet(end)) {
            connectedComponents.union(start, end);
            tree.addEdge(start, end, edge);
        }
    }
    
    return tree;
}

Introduction to Algorithms, 3rd edition by Thomas H. Cormen et al. devotes the whole Chapter 21 to the disjoint-set data structure. For a given set of elements, the data structure holds its subsets such that

every elements belongs to a subset held in the data structure and
subsets in the data structure do not intersect (i.e., disjoint).

Initially, all elements belong to its own, singleton, subset. The data structure allows to check if two elements belong to the same subset and to join subsets. This makes it ideal for graph algorithms: elements are vertices and subsets are connected components of the graph.

public class DisjointSet {
    private Map<Integer, Entry> setEntries;
    
    private static class Entry {
        private Integer parent;
        private int rank;
    }
    
    // Creates a collection of singleton sets.
    public DisjointSet(Set<Integer> vertices) {
        setEntries = new HashMap<>();
        
        for (Integer vertex: vertices) {
            Entry r = new Entry();
            r.parent = vertex;
            r.rank = 0;
            setEntries.put(vertex, r);
        }
    }
    
    // Find the representative member of the subset containing the given element.
    public Integer findSet(Integer x) {
        Entry r = setEntries.get(x);
        
        if (r.parent != x) {
            r.parent = findSet(r.parent);
        }
        
        return r.parent;
    }
    
    // Joins two subsets containing the given elements.
    public void union(Integer x, Integer y) {
        link(findSet(x), findSet(y));
    }
    
    private void link(Integer x, Integer y) {
        Entry rx = setEntries.get(x);
        Entry ry = setEntries.get(y);
        
        if (rx.rank > ry.rank) {
            ry.parent = x;
        } else {
            rx.parent = y;
            
            if (rx.rank == ry.rank) {
                ry.rank++;
            }
        }
    }
}

Upvotes: 1

Golibjon Odinaev

Reputation: 1

Here you must use from interfaces, befor create

interface A {
    int set1 {
        Set<Integer> set = new HashSet<>();
        set.add(i);
        return set;
    }
}

interface B extends A {
    Set<Integer> connectedComponents = new HashSet<>();
    for (Integer i : graph.vertexSet()) {
        Set<Integer> set = new HashSet<>();
        set.add(i);
        connectedComponents.add(set);
    }

...etc. So you can solve problem step by step.

Upvotes: -2

Problem removing from Java-HashSet while implementing Kruskal&#39;s algorithm

Answers (3)

Related Questions

Problem removing from Java-HashSet while implementing Kruskal's algorithm