Reputation: 161
I am writing a parallel program that generates combinations as a boolean vector form. Since the combinations are generated using threads, I am combining them when every thread finishes its job.
set<vector<bool>> Ci[T.size()];
//generate combinations in parallel and thread i add result in Ci[i]
start_time = omp_get_wtime();
set<vector<bool>> C;
for(int i=0;i<T.size();i++){
cout << "i=" << i << "\t" << Ci[i].size() << endl;
C.insert(Ci[i].begin(),Ci[i].end());//here combine them into a single set
}
end_time = omp_get_wtime();
total_time = end_time - start_time;
cout << " Total time:" <<total_time << endl;
cout << C.size();
The total number of threads, T.size() will vary but total combination(C's size) is always the same. For example,
Enter threads:10
i=0 1
i=1 144
i=2 4320
i=3 47040
i=4 229320
i=5 550368
i=6 672672
i=7 411840
i=8 115830
i=9 11440
Total time:36.641s
C's size :2042975
Enter threads:128
i=0 9
i=1 45
i=2 165
...
i=120 10
i=121 11
i=122 12
i=123 13
i=124 14
i=125 15
i=126 16
i=127 18
Total time:6.432s
C's size :2042975
What I don't understand is in both case, I am inserting the same amount of combinations into C. Why is the time taken not the same?
Upvotes: 3
Views: 119
Reputation: 23497
We don't see all the necessary information, but I suspect the problem is caused by the uneven distribution of set sizes. The complexity of the insertion is O(n log(m + n)), where n is the size of a set being inserted into (C
) and m is the size of the set being inserted (Ci[i]
).
Now, consider that both sets together have 2^24 elements.
Case 1: n=2^16, m=2^24-2^16
Here, the term inside "big O" is turned into 2^16 * log(2^24-2^16+2^16), which is 2^16*24.
Case 2: n=2^23, m=2^23
Here, the same term is 2^23*24, which is 2^7 times more than in Case 1.
My point is that if you "merge" two sets with the same total number of elements, the runtime may very much depend on how are these elements distributed among them.
In your first case (10 threads), the distribution of set sizes seems to be more like in Case 2 (most of them have a relatively high number of elements). We don't see the complete data for your second case (128 threads), but the set sizes seem to be much more unevenly distributed since the shown ones have only very few elements. (Note that it may, or may not, indicate that your parallel computation is poorly load balanced.)
Upvotes: 3