Reputation: 1864
I have a C program that must be implemented using OpenMP library. Its structure is:
for (t = 0; t < IT; ++t) {
#pragma omp parallel for private(i, j, k, l) schedule(dynamic)
for (i = 0; i < n; ++i) {
for (j = 0; j < n; ++j) {
for (k = 0; k < n; ++k) {
for (l = 0; l < n; ++l) {
// calculations 0
}
}
// calculations 1
}
}
#pragma omp parallel for private(i, j) schedule(dynamic)
for (i = 0; i < n; ++i) {
for (j = 0; j < n; ++j) {
// calculations 2
}
}
}
This programs makes some calculations on a matrix. Calculation 2 must be done after calculations 0 and 1 are done, because it make some modifications in the matrix.
The problem is that the speedup is very bad, i.e. the program is not scalable. The serial version for a given input runs in 79.46s. When running with two threads it finishes in 41s giving an almost perfect speedup of 1.93 times, but when running 3 threads it finishes in 37.86s (with a speedup of only 2.1 times), and with 4 threads it takes 34.104s (with a speedup of only 2.3 times).
Why is this not scalable?
PS. I have an Intel i5 430M with 4 cores.
Upvotes: 1
Views: 569
Reputation: 154
It is not so scalable because you have Intel Core I5. Intel Core I5 has 2 Cores and 4 Threads, so only 2 real cores, not 4. ( This type of processor uses the technology of Hyper-Threading )
The difference between a processor with 2 cores and 2 threads (e.g. Dual Core, Core 2 Duo, Core I3) and your Core I5 (which has 2 cores and 4 threads and uses Hyper-Threading to act like a quad-core processor) is that the boost in performance of your Hyper-Threaded Core I5 CAN be up to 30%. But you cannot compare your Hyper-Threaded Core I5 with a Core I7 (which has 4 cores and 4 threads).
Upvotes: 8