Reputation: 5166

Adavantages of HashSet over ArrayList and vice versa

I have a doubt regarding data structures in Java. While solving a typical hashing problem in Java, I was using the HashSet data structure, which worked fine until there were duplicate objects (object contents). Since HashSet does not support insert of duplicates, my logic was failing.

I replaced the hashset with the typical Arraylist since the methods of hashset such as .add(), .contains(), .remove() are supported in both, and my logic worked perfectly then.

But does this necessarily mean ArrayList is the logical choice over Hashset when duplicates are involved? There should be some time complexity advantages of Hashset over ArrayList right? Can someone please provide me some insight regarding this?

EDIT: What would be the ideal data structure when you want to do hashing when duplicates are involved. I mean when the duplicates should not be ignored and should be inserted.

Upvotes: 1

Answers (4)

Mshnik

Reputation: 7032

If you specifically need a HashSet that handles duplicates, a HashMap will be able to do the job. If you just need a count of the number of objects added (with quick lookup/etc), a HashMap<T,Integer> will be ideal, where T is the type of your object. If you actually need to keep references to the duplicate objects you've added, go with HashMap<T, List<T>>. That way you can look up by using HashMap's .containsKey(T t), and iterate through all of the similarly hashing objects in the resulting list. So for example, you could create this class:

public class HashSetWithDuplicates<T> {

    private HashMap<T, List<T>> entries;
    private int size;

    public HashSetWithDuplicates(){
        entries = new HashMap<>();
        size = 0;
    }

    public HashSetWithDuplicates(Collection<? extends T> col){
        this();
        for(T t : col){
            add(t);
        }
    }

    public boolean contains(T t){
        return entries.containsKey(t);
    }

    public List<T> get(T t){
        return entries.get(t);
    }

    public void add(T t){
        if (!contains(t)) entries.put(t, new ArrayList<>());

        entries.get(t).add(t);
        size++;
    }

    public void remove(T t){
        if (!contains(t)) return;
        entries.get(t).remove(t);
        if(entries.get(t).isEmpty()) entries.remove(t);
        size--;
    }

    public int size(){
        return size;
    }

    public boolean isEmpty(){
        return size() == 0;
    }
}

Add functionality to your needs.

Upvotes: 0

gdejohn

Reputation: 7579

It's not clear what you mean by a "hashing problem," but maybe you're looking for a multiset. From the Guava docs:

A collection that supports order-independent equality, like Set, but may have duplicate elements. A multiset is also sometimes called a bag.

Elements of a multiset that are equal to one another are referred to as occurrences of the same single element. The total number of occurrences of an element in a multiset is called the count of that element (the terms "frequency" and "multiplicity" are equivalent, but not used in this API).

No such thing exists in the JDK.

Upvotes: 4