Daniel Shapero
Daniel Shapero

Reputation: 1889

fortran erroneously calls a subroutine

I have some Fortran 90 code that I've been using for finite element computations. Lately, I've been trying to improve how it solves block linear systems. Before, I had a subroutine amux used for sparse matrix-vector multiplication and another subroutine cg which implements the conjugate gradient method using amux. I wrote a new matrix-vector subroutine block_amux and likewise a new solver block_cg. By all rights, the new method should run faster, but instead it runs 10 times slower.

In order to track down the problem, I used the profiler gprof to see what was going on. I found that 92.5% of my code was spent running the cg subroutine -- even though I never called it, and relied exclusively on block_amux and block_cg. To muddy the waters even further, I put a print statement in the actual cg routine saying "Hello world"; it was never printed. Finally, I noticed that gprof lists no uses of the amux subroutine, even though a genuine call to cg would have done hundreds of ordinary matrix multiplications.

I'm mystified as to what could be doing this. Any thoughts? I can attach the gprof output if that helps too.

Update: I have made the following changes, with the same result some way or other:

  1. Change the names of the subroutines, for example cg becomes conjugate_gradient. Gprof then reports that I'm wasting time in the new conjugate_gradient routine.
  2. Move the subroutines that I actually use into my main program under a "contains" statement instead of the module linalg_mod in which they originally resided, then stop using the module containing the CG routine. Instead, the program wastes time in something called a "frame_dummy". This looks suspiciously similar to this post, but I can't
  3. Move the subroutines I use from linalg_mod, which contains the CG routine, to a new module linalg_mod_decoy, which does not contain it. Instead of wasting time in the CG algorithm, gprof says that the program is calling a subroutine I use to generate the right-hand side of the linear system ~3000 times instead of just once.
  4. Try it on a different computer. No difference.

Upvotes: 2

Views: 387

Answers (1)

duplode
duplode

Reputation: 34398

Quoting a comment by korrok, the question author:

OpenMP was the culprit. I figured that if I set the number of threads to 1 I would get the same result as profiling without OMP at all. When I stopped compiling with OpenMP it still performed poorly but correctly reported where all the work was being done.

Upvotes: 1

Related Questions