Reputation: 1619
I have this sequential code in Fortran. My problem is, when I put Openmp directives, the paralleled code is more slow than the sequential, and I don't see the error.
REAL, DIMENSION(:), ALLOCATABLE :: current, next
ALLOCATE ( current(TOTAL_Z), next(TOTAL_Z))
CALL CPU_TIME(t1)
!$OMP PARALLEL SHARED (current, next) PRIVATE (z)
DO t = 1, TOTAL_TIME
!$OMP DO SCHEDULE(STATIC, 2)
DO z = 2, (TOTAL_Z - 1)
next(z) = current (z) + KAPPA*DELTA_T*((current(z - 1) - 2.0*current(z) + current(z + 1)) / DELTA_Z**2)
END DO
!$OMP END DO
current = next
END DO
CALL CPU_TIME(t2)
!$OMP END PARALLEL
TOTAL_Z, TOTAL_TIME, KAPPA, DELTA_T, DELTA_Z are constants.
When I run the paralleled code, I see in htop and my 2 cores are working at 100%
In sequential code, CPU_TIME is 79 seg and in paralleled is 132 seg
Thank
Upvotes: 0
Views: 706
Reputation: 2853
depending on the number of iterations, you might also be facing a problem with false-sharing on the nest array. Since the chunk size for the distribution of the DO loop is rather small, the cache line for nest(z), nest(z+1), nest(z+2), nest(z+3), etc might be thrashing between the L1/L2 caches of the CPU.
Cheers, -michael
Upvotes: 1
Reputation: 115
I've just been experiencing the same problem.
It seems that using cpu_time() is not suitable to measure the performance of multi-threaded code. cpu_time() will add the total time of all the threads which is likely to increase with increasing number of threads.
I've found this in another forum, http://software.intel.com/en-us/forums/topic/281897
You should use system_clock() or omp_get_wtime() functions to get a more accurate timing of your routine.
Upvotes: 3
Reputation: 29391
It is probably slow because of the threads are contending to access the shared
variables. If you can change it to use reduction
it would likely be faster. But that might not be easy since the calculation for "current" accesses multiple array elements.
Upvotes: 1