arraysfortrandynamic-memory-allocationintel-fortran

Reputation: 1209

Understanding a deallocation error

I wrote a small, simple code that replicates an error I get in another, much larger code:

PROGRAM allocateBug                                                              
    IMPLICIT NONE                                                                  
    INTEGER, PARAMETER :: Nx = 10                                                
    INTEGER, PARAMETER :: Ny = 20                                                
    INTEGER, PARAMETER :: Nz = 30                                                
    REAL, ALLOCATABLE, DIMENSION(:,:,:) :: a                                 

    ALLOCATE(a(0:Nx-1,0:Ny-1,0:Nz-1))                          

    a(Nx+2,:,:) = 0.4                                                            

    PRINT*, "size(a) = ", SIZE(a,1)                                              
    DEALLOCATE(a)  
END PROGRAM allocateBug

The output of the code is:

`size(a) = 10`

Here is the following error message:

*** glibc detected *** ./a.out: free(): invalid next size (normal): 0x0000000001a97060 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7f652d0bcb96]
./a.out[0x40719c]
./a.out[0x402ebf]
./a.out[0x402bc6]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f652d05f76d]
./a.out[0x402ab9]
(... more lines ...)

I do not get an error while trying to access the array a out of bounds, a feature I'd already known from ifort. Why is there an error only while deallocating the array? Also, If I access a at Nx or Nx+1, the code exits with no errors.

EDIT

To clarify my question, when printing the size of a, the code tells me that it still considers a to be of size 10 in the first dimension. However, the error while deallocating a tells me that something was changed in the state of a while writing to it out of bounds. I'm just very curious about exactly what happens during this code so that an error occurs.

Upvotes: 1

Answers (3)

casey

Reputation: 6915

First off, writing out of bounds is an undefined operation and as such your whole program becomes undefined. Anything it does is correct at that point. Whether your program runs properly, crashes, doesn't run at all, or some other option, that is a correct result of undefined behavior and not a bug.

Per your comments, you are more interested in exactly how the bad write causes the inability to deallocate it from a low level perspective rather than just accepting that undefined behavior can do whatever it wants to do.

Lets first look at your array a:

a(0:9,0:19,0:29)

of size (10,20,30) which has 6,000 elements of 4 byte floating point values, for a total of 24,000 bytes of storage. Your undefined write is

a(12,:,:) = 0.4

This will write to 600 elements of the array a(12,0:19,0:29), though only one element is out of bounds. The element a(12,19,29) would write to the 6003rd element. The other writes will be in bounds but will corrupt the contents of the array by writing to improper elements from the out of bounds index.

If your variable a was allocated to addresses 0x0000-0x5DBF then element (9,19,29) would be at address 0x5DBC-0x5DBF, and your out of bounds write to element (12,19,29) would be at 0x5DC8-0x5DCB, or 8-12 bytes beyond the end of the array.

What follows next is implementation dependent and is based upon an analysis of gfortran 4.9.2.

Unlike C, arrays in Fortran have metadata known as "array descriptors". GNU gfortran uses the following descriptor for arrays of 4 byte reals:

 typedef struct gfc_array_r4 { 
   GFC_REAL_4 *base_addr;
   size_t offset;
   index_type dtype;
   descriptor_dimension dim[r];
 }

The variable descriptor_dimension is an array of length GFC_MAX_DIMENSIONS and has the following structure:

 typedef struct descriptor_dimension
 {
   index_type _stride;
   index_type lower_bound;
   index_type _ubound;
 }

The reason your example code can still tell you the proper size of a is that this metadata holds that information.

Following the internal codepaths for allocatable components is more difficult and I don't have the time to do a proper inspection. From a cursory look, however, it appears there is more metadata associated with allocatable types and various allocation strategies (malloc and others).

The only general statement I can make from the above is that some vital piece of data needed for the deallocate routines internal to gcc to work was located, at least in part, in the memory 8-12 bytes beyond the end of the array memory. When you wrote to the memory between 0 and 8 bytes past the end of the array and noted no fatal runtime errors, you did not overwrite the vital data. The specifics of the data you are corrupting and how it is arranged in the heap relative to your array is heavily implementation dependent, not only between compiler vendors but potentially also between compiler releases.

Furthermore, note that while writing to an array element like a(12,0,0) will be in-bounds with respect to the allocated array memory but out-of-bounds with respect to your dimensional bounds. Although it will not emit a runtime error without bounds checking, note that e.g. a(12,0,0) is the same element as a(2,1,0) in memory, so your writes out-of-bounds are clobbering in-bounds values.

Upvotes: 4

user4490638

Reputation:

ALLOCATE(a(0:Nx-1,0:Ny-1,0:Nz-1))                          

a(Nx+2,:,:) = 0.4

Here, you allocate A along the first dimension ranging from 0 to Nx-1. Then you assign a value to it outside these bounds, from Nx + 2.

Bad idea. Either you get something strange, such as heap corruption, or you set the right compiler flags and get a runtime error. gfortran complains with -fcheck=all that

At line 10 of file a.f90
Fortran runtime error: Index '12' of dimension 1 of array 'a' outside of expected range (0:9)

which is a clear enough indication of where the error is.

Upvotes: 1

albapa

Reputation: 261

I'd expect the reason is that there is no run-time check for reading/writing to the array out of bounds. If you compile with -check bounds I suppose it'll complain, something like

forrtl: severe (408): fort: (2): Subscript #1 of the array A has value 12 which is greater than the upper bound of 9

So when it doesn't perform the run-time check, it writes happily to the memory where that index is supposed to be - except that it overwrites stuff that's there. In this current case there must be something that specifies the array itself (remember, in FORTRAN allocatable arrays are much more than memory addresses), and when deallocating, it must give a wrong command to what and where to deallocate.

If you were trying to write to a memory region which is outside of what's allocated to your executable by the operating system, you'd get a segfault, so this is something similar to that.

EDIT: You are asking essentially how the compiler deals with allocatable arrays. For example, in C, when you allocate, all you do is you tell the code to reserve a contiguous chunk of memory of the right size, it tells you where it is, and then you have to keep track how long that space actually is.

In FORTRAN, it's different. When you allocate a variable, you can, for example, query it's length, shape etc. This has to be stored somewhere. How and where it is stored is completely compiler dependent. I have no idea how it is implemented in ifort, but I'd imagine each allocatable variable will have a header, i.e. a reserved space, where all the information related to the shape is stored, and the actual elements of the array come after that, in a contiguous space.

When you say a(Nx+2,:,:) your code works out, based on the header, what area of the memory you want to write to/read from, and then does it. Perhaps in your case, this operation corrupted the header itself, and when you tried to deallocate the variable, your code might have interpreted that it should deallocate the space in memory where that cute cat photo is which you are currently browsing. This upset the operating system and told your code to stop. Or it might have interpreted that it should deallocate a negative chunk of memory and said: WHAT????

Upvotes: 1

Understanding a deallocation error

Answers (3)

Related Questions