Pratik Shekhar
Pratik Shekhar

Reputation: 5

Create a HashSet with duplicate values

Is there a possibility that HashSet could have duplicate values in case of multiple threads adding items to it?

I'm not looking from modifying the equals or hashcode methods perspective but simply from multithreaded environment.

Upvotes: 0

Views: 1659

Answers (3)

Kawser Habib
Kawser Habib

Reputation: 1422

We can discuss this answer part by part:

  1. HashSet does not allow duplicate elements which means you can not store duplicate values in HashSet. And its alternative Hashmap doesn't allow duplicate keys however, it allows duplicate values.

  2. HashSet in Java is not thread-safe as it is not synchronized by default. If you are using HashSet in a multi-threaded environment where it is accessed by multiple threads concurrently and structurally modified too by even a single thread then it must be synchronized externally. A structural modification is defined as any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.

    So, when you update HashSet from multiple threads without external sync, its behavior will be unpredictable.

  3. To avoid this unpredictable behavior, we can synchronize HashSet by using Collections.synchronizedSet() method.

Example:

  • First, we’ll see an example what happens if HashSet is used in a multi-threaded environment without synchronizing it.

    In the Java code four threads are created, each of these thread adds 5 elements to the Set. After all the threads are done Set size should be 20.

public class SetSynchro implements Runnable{
  private Set<String> numSet;
  public SetSynchro(Set<String> numSet){
    this.numSet = numSet;
  }
  
  public static void main(String[] args) {
    Set<String> numSet = new HashSet<String>();
    /// 4 threads
    Thread t1 = new Thread(new SetSynchro(numSet));
    Thread t2 = new Thread(new SetSynchro(numSet));
    Thread t3 = new Thread(new SetSynchro(numSet));
    Thread t4 = new Thread(new SetSynchro(numSet));
        
    t1.start();
    t2.start();
    t3.start();
    t4.start();
        
    try {
      t1.join();
      t2.join();
      t3.join();
      t4.join();
    } catch (InterruptedException e) {    
      e.printStackTrace();
    }
    System.out.println("Size of Set is " + numSet.size());
  }

  @Override
  public void run() {
    System.out.println("in run method" + Thread.currentThread().getName());
    String str = Thread.currentThread().getName();
    for(int i = 0; i < 5; i++){
      // adding thread name to make element unique
      numSet.add(i + str);
      try {
        // delay to verify thread interference
        Thread.sleep(500);
      } catch (InterruptedException e) {
        e.printStackTrace();
      }
    }
  }
}

Output:

in run methodThread-2
in run methodThread-0
in run methodThread-3
in run methodThread-1
Size of Set is 19
//In one of the run size was 19, in another run 18 and sometimes even 20, so you can see that thread interference is making the behavior unpredictable.
  • So you can see that thread interference is making the behavior unpredictable. So we’ll synchronize the HashSet using the same example.
public class SetSynchro implements Runnable{
  private Set<String> numSet;

  public SetSynchro(Set<String> numSet){
    this.numSet = numSet;
  }

  public static void main(String[] args) {
    // Synchronized Set
    Set<String> numSet = Collections.synchronizedSet(new HashSet<String>());
    /// 4 threads
    Thread t1 = new Thread(new SetSynchro(numSet));
    Thread t2 = new Thread(new SetSynchro(numSet));
    Thread t3 = new Thread(new SetSynchro(numSet));
    Thread t4 = new Thread(new SetSynchro(numSet));
    
    t1.start();
    t2.start();
    t3.start();
    t4.start();
        
    try {
      t1.join();
      t2.join();
      t3.join();
      t4.join();
    } catch (InterruptedException e) {    
      e.printStackTrace();
    }
    System.out.println("Size of Set is " + numSet.size());
  }

  @Override
  public void run() {
    System.out.println("in run method" + Thread.currentThread().getName());
    String str = Thread.currentThread().getName();
    for(int i = 0; i < 5; i++){
      // adding thread name to make element unique
      numSet.add(i + str);
      try {
        // delay to verify thread interference
        Thread.sleep(500);
      } catch (InterruptedException e) {
        e.printStackTrace();
      }
    }
  }
}

Output:

in run methodThread-3
in run methodThread-2
in run methodThread-1
in run methodThread-0
Size of Set is 20
//Now every time size of HashSet is 20.

For more details, this link is useful. The code block is also taken from there.

Upvotes: 1

Stephen C
Stephen C

Reputation: 718788

Is there a possibility that HashSet could have duplicate values in case of multiple threads adding items to it?

HashSet is not a thread-safe class. If you update a HashSet from multiple threads without proper synchronization, then the behavior is unspecified, and difficult to predict. (And Java version dependent, given that the implementation of HashSet has changed a number of times over the lifetime of Java SE.)

The unspecified behavior could include duplicates appearing in the set as observed via the set's iterator.

If you want to share a (mutable) set between multiple threads, either use a ConcurrentHashSet or a Collections.synchronizedSet wrapper or an explicit Lock or mutex to synchronize operations.

(The different alternatives all have caveats associated with them. We can't recommend a specific alternative based on the limited information you have given.)

Upvotes: 1

hfontanez
hfontanez

Reputation: 6168

From Javadoc

Note that this implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:

Set s = Collections.synchronizedSet(new HashSet(...));

Basically, the way that I interpret this, is that under the right (or wrong) conditions, there might be a slight possibility that this could happen. HOWEVER, since the assumption is that you know that multiple threads will access this set, you need to synchronize access externally (since HashSet is not thread-safe). OR, in the absence of such external mechanism, you will need to wrap this set as shown above.

If you really want to find out, create an application with a lot of threads that attempt to set the same value. After insertion, print out some message to the console if the size of the set is ever greater than 1. That should tell you the answer. Maybe the chances of this happening are very small. But the class documentation tells you that if you expect to use in multithreaded process, you should synchronize it.

Upvotes: 0

Related Questions