Reputation: 5
I have been trying to implement the below code on quad core computer and average running times with No of threads in the Executor service over 100 iterations is as follows
1 thread = 78404.95
2 threads = 174995.14
4 thread = 144230.23
But according to what I have studied 2*(no of cores)
of threads should give optimal result for the program which is clearly not the case in my program which bizarrely gives best time for single thread.
Code :
import java.util.Collections;
import java.util.Random;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class TestHashSet {
public static void main(String argv[]){
Set<Integer> S = Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());
S.add(1);
S.add(2);
S.add(3);
S.add(4);
S.add(5);
long startTime = System.nanoTime();
ExecutorService executor = Executors.newFixedThreadPool(8);
int Nb = 0;
for(int i = 0;i<10;i++){
User runnable = new User(S);
executor.execute(runnable);
Nb = Thread.getAllStackTraces().keySet().size();
}
executor.shutdown();
try {
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
long endTime = System.nanoTime();
System.out.println(0.001*(endTime-startTime)+" And "+Nb);
}
}
class User implements Runnable{
Set<Integer> S;
User(Set<Integer> S){
this.S = S;
}
@Override
public void run() {
// TODO Auto-generated method stub
Set<Integer> t =Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());;
for(int i = 0;i<10;i++){
t.add(i+5);
}
S.retainAll(t);
Set<Integer> t2 =Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());;
for(int i = 0;i<10;i++){
t2.add(i);
}
S.addAll(t);
/*
ConcurrentHashSet<Integer> D = new ConcurrentHashSet<Integer>();
for(int i=0;i<10;i++){
D.add(i+3);
}
S.difference(D);
*/
}
}
Update : If I increase no of queries per thread to 1000 , 4-threaded is performing better than Single threaded .I think overhead has been higher than run-time when I used only about 4 queries per thread and as no of queries increased Runtime is now greater than Overhead.Thanks
Upvotes: 0
Views: 536
Reputation: 718788
But 5 Threads Supposed to increase the performance..?
That's what >>you<< suppose. But in fact, there are no guarantees that adding threads will increase performance.
But according to what I have studied 2*(no of cores) of threads should give optimal result ...
If you read that somewhere, then you either misread it or it is plain wrong.
The reality is that the number of threads for optimal performance is highly dependent on the nature of your application, and also on the hardware you are running on.
Based on a cursory reading of your code, it appears that this is a benchmark to test how well Java deals with multi-threaded access and updates to a shared set (S
). Each thread is doing some operations on a thread-confined set, then either adding or removing all entries in the thread-confined set to the shared set.
The problem is that the addAll
and retainAll
calls are likely to be concurrency bottlenecks. A set based on ConcurrentHashMap will give better concurrent performance for point access / update to the set than on based on HashMap. However, addAll and retainAll perform N such operations, on the same entries that the other threads are operating on. Given the nature of this pattern of operations, you are likely to get significant contention within the different regions of the ConcurrentHashMap. That is likely to lead to one thread blocking another ... and a slowdown.
Update : If I increase no of queries per thread 4-threaded is performing better than Single threaded .I think overhead has been higher than run-time when I used only about 4 queries per thread and as no of queries increased Runtime is now greater than Overhead.
I assume that you mean that you are increasing the number of hash map entries. This is likely to reduce the average contention, given the way that ConcurrentHashMap
works. (The class divides the map into regions, and arranges that operations involving entries in different regions incur the minimum possible contention overheads. By increasing the number of distinct entries, you are reducing the probability that two simultaneous operations will lead to contention.)
So returning to the "2 x no of threads" factoid.
I suspect that the sources you have been reading don't actually say that that gives you optimal performance. I suspect that they really say that that:
"2 x no of threads" is a good starting point ... and you need to tune it for your application / problem / hardware, and/or
don't go above "2 x no of threads" for a compute intensive task ... because it is unlikely to help.
In your example, it is most likely that the main source of the contention is in the updates to the shared set / map ... and the overheads of ensuring that they happen atomically.
You can also get contention at a lower level; i.e. contention for memory bandwidth (RAM read/write) and memory cache contention. Whether that happens will depend on the specs of the hardware you are running on ...
The final thing to note is that your benchmark is flawed in that it does not allow for various VM warmup effects ... such as JIT compilation. The fact that your 2 thread times are more than double the 1 thread times points to that issue.
There are other questionable aspects about your benchmarking:
The amount of work done by the run()
method is too small.
This benchmark does not appear to be representative of a real-world use-case. Measuring speed-up in a totally fictitious (nonsense) algorithm is not going to give you any clues about how a real algorithm is likely to perform when you scale the thread count.
Running the tests on a 4 core machine means that you probably wouldn't have enough data points to draw scientifically meaningful conclusions ... assuming that the benchmark was sound.
Having said that, the 2 to 4 thread slowdown that you seem to be seeing is not unexpected ... to me.
Upvotes: 3