Reputation: 403
Update: NVIDIA responded to my bug report (link is at bottom of this description) stating that they are able to reproduce the problem locally, and will have people look into it.
This is a complex problem which I've tried to abbreviate to bare essentials. I think it would be easier for interested parties to reference a Google Doc I wrote, Report on NVIDIA HPC pgfortran / nvfortran modules memory problem (abbreviated version), and inside this document is a link to a more thorough document.
I will try to replicate the "abbreviated description" here
I have a sample code consisting of a small module and a main program in single source file.
The purpose of the main program is to read the contents of a namelist file into variables.
The variables (except for one) are declared in the module, which also declares a large static array
As written (see below) the program seg faults at the READ statement, but if I move the declaration of namelist variables BEFORE the array declaration, it reads everything fine
Or, if I make the static array smaller, everything works fine
The problem shows up when using the NVIDIA HPC SDK Fortran compilers. I don’t run into the problem when compiling with gfortran or ifort.
! Test module defining 3D dimensions, declaring 3D arrays, ! and variables to be filled from a namelist. MODULE my_mod IMPLICIT NONE INTEGER, PARAMETER :: nxmax=2881, nymax=1441, nzmax=138 !!!INTEGER, PARAMETER :: nxmax=361, nymax=181, nzmax=138 ! 3d fields !********** real :: a(nxmax, nymax, nzmax) ! Declaring these namelist values here results in seg fault ! upon reading the namelist. If I move these above the declaration ! of 3D array, everything works fine integer :: numxgrid,numygrid real :: dxout,dyout,outlon0,outlat0 END MODULE my_mod !======================================================= PROGRAM modmemtest USE my_mod IMPLICIT NONE REAL :: outheights NAMELIST /outgrid/ & & outlon0, outlat0, & & numxgrid, numygrid, & & dxout, dyout, & & outheights OPEN(48, FILE="OUTGRID", status='old', form='formatted') PRINT *, 'BEFORE READING NAMELIST' READ(48, outgrid) PRINT *, 'AFTER READING NAMELIST' CLOSE(48) PRINT *, 'outlon0, outlat0: ', outlon0, outlat0 PRINT *, 'numxgrid, numygrid: ', numxgrid, numygrid PRINT *, 'dxout, dyout: ', dxout, dyout PRINT *, 'outheights: ', outheights END PROGRAM modmemtest
Correct behaviour (moving namelist variable declarations before the array declaration, or declaring smaller array) looks like this
$ pgfortran -o minitest -mcmodel=medium MINITEST.f90 $ ./minitest BEFORE READING NAMELIST AFTER READING NAMELIST outlon0, outlat0: 12.00000 47.00000 numxgrid, numygrid: 4 3 dxout, dyout: 1.000000 1.000000 outheights: 5000.000
Incorrect behaviour looks like this
$ pgfortran -o minitest -mcmodel=medium MINITEST.f90 $ ./minitest BEFORE READING NAMELIST Segmentation fault
This happens on two different systems with two different versions of the HPC SDK - RHEL 7 with NVIDIA HPC SDK v21.9, and Ubuntu 20.04 (an AWS instance) with NVIDIA HPC SDK v22.1.
Again, there's a link at the top of this description to a Google Doc (web-shared) that's roughly what I wrote above, but in it is a link to a document with much more information.
Related to all of this, I'm wondering if anybody is able to comment on the "maturity" of these new "PGI" Fortran compilers. I used to use it extensively in the old days, but the impression I'm getting when looking through problem reports (this isn't my first problem) is that NVIDIA may be struggling to bring its product back up to the standards of the original PGI compilers. One thing that comes to mind is the lack of support for -mcmodel=large
, and it seems I've seen some comments from NVIDIA that some of these features are not "yet" incorporated in the new product?
Added note - I submitted a bug report to NVIDIA, but somehow my nicely spaced text got screwed up in it. It's at - https://developer.nvidia.com/nvidia_bug/3721835
Upvotes: 2
Views: 150