F.N.B
F.N.B

Reputation: 1619

Fortran + Openmp more slow that sequential

I have this sequential code in Fortran. My problem is, when I put Openmp directives, the paralleled code is more slow than the sequential, and I don't see the error.

REAL, DIMENSION(:), ALLOCATABLE :: current, next
ALLOCATE ( current(TOTAL_Z), next(TOTAL_Z))

CALL CPU_TIME(t1)

!$OMP PARALLEL SHARED (current, next) PRIVATE (z)
DO t = 1, TOTAL_TIME
    !$OMP  DO SCHEDULE(STATIC, 2)
    DO z = 2, (TOTAL_Z - 1)
        next(z) = current (z) + KAPPA*DELTA_T*((current(z - 1) - 2.0*current(z) +      current(z + 1)) / DELTA_Z**2)
    END DO
    !$OMP END DO
    current = next
END DO

CALL CPU_TIME(t2)

!$OMP END PARALLEL 

TOTAL_Z, TOTAL_TIME, KAPPA, DELTA_T, DELTA_Z are constants.
When I run the paralleled code, I see in htop and my 2 cores are working at 100%
In sequential code, CPU_TIME is 79 seg and in paralleled is 132 seg
Thank

Upvotes: 0

Views: 706

Answers (3)

Michael Klemm
Michael Klemm

Reputation: 2853

depending on the number of iterations, you might also be facing a problem with false-sharing on the nest array. Since the chunk size for the distribution of the DO loop is rather small, the cache line for nest(z), nest(z+1), nest(z+2), nest(z+3), etc might be thrashing between the L1/L2 caches of the CPU.

Cheers, -michael

Upvotes: 1

Pedro Inácio
Pedro Inácio

Reputation: 115

I've just been experiencing the same problem.

It seems that using cpu_time() is not suitable to measure the performance of multi-threaded code. cpu_time() will add the total time of all the threads which is likely to increase with increasing number of threads.

I've found this in another forum, http://software.intel.com/en-us/forums/topic/281897

You should use system_clock() or omp_get_wtime() functions to get a more accurate timing of your routine.

Upvotes: 3

M. S. B.
M. S. B.

Reputation: 29391

It is probably slow because of the threads are contending to access the shared variables. If you can change it to use reduction it would likely be faster. But that might not be easy since the calculation for "current" accesses multiple array elements.

Upvotes: 1

Related Questions