Reputation: 2649

stack growth and mmap stack?

I am reading this page about memory overcommit, and it mentions

The C language stack growth does an implicit mremap. If you want absolute
guarantees and run close to the edge you MUST mmap your stack for the 
largest size you think you will need. For typical stack usage this does
not matter much but it's a corner case if you really really care

When a program is complied (by gcc for example), the size limit of stack is defined (I remember there is a gcc parameter to adjust it). Then, inside the program, we can keep allocating on the stack.

Few questions:

What does "stack growth" in this context? Does it mean if a C program keeps allocating/deallocating on the stack, sometimes, mremap() will be called behind the scene? And why if the size limit of a stack has been defined at compile time?
How can we mmap the stack?

Upvotes: 1

Answers (1)

Blabbo the Verbose

Reputation: 104

The "magic" here is the behaviour of MAP_GROWSDOWN flag (implemented by the Linux kernel) when a process requests new memory from the kernel via mmap() system call, and that it is often used for the initial stack (the stack for the first thread in a process, when it is first executed).

So, while new processes do typically get a MAP_GROWSDOWN stack by default, a process can manage its own stack as well. If the process creates new threads, it has to create stacks for them. (Currently, pthread_create() creates a fixed-size stack (of default maximum size, or sized as directed in the pthread_attr_t attribute block if specified), not a MAP_GROWSDOWN stack.)

The way the Linux kernel implements a MAP_GROWSDOWN memory mapping is that the actual memory is preceded by an extra page, called a "guard page". (On x86-64, pages are aligned units of 4096 bytes, but other page sizes exist; at run time, use sysconf(_SC_PAGESIZE) to obtain the size in bytes.)

Whenever the guard page is first accessed, the kernel converts it to a standard page (same as the other pages in that same mapping), and creates a new guard page just below (at the next smaller page address). If there is something already mapped at those virtual addresses, the mapping is not changed, and the process will receive a SIGSEGV (segment violation error). Thus, only the amount of available address space (and indirectly, available memory) limits the growth of such stacks.

This also means that using local arrays greater than a page size can lead to a SIGSEGV, if relying on MAP_GROWSDOWN automatic stack growth. It is therefore much more reliable to use dynamic memory management in C –– malloc()/realloc()/free(), and interfaces like getline() and asprintf() –– than rely on large on-stack fixed-size arrays.

Essentially, as long as the stack elements are at most a page in size, such stacks will automatically grow as needed.

The "implicit remap" thus only applies to the initial thread, because it uses a stack that uses the MAP_GROWSDOWN flag; and the implicit remap itself refers to this auto-growsdown facility in page-sized units.

If your process does many separate mmap() calls for different kinds of allocations, say maps files to memory or such, it is possible that they will be located such that the growth of the MAP_GROWSDOWN mappings is limited to less than what the process expects. (The addresses given by the kernel are at least somewhat randomized, for security purposes.)

The suggestion for remapping the kernel for the largest size one might need, means that one can –– I'm not sure I agree with "MUST" ––, near the beginning of their program, use mremap() to convert the MAP_GROWSDOWN mapping to a larger, fixed-size mapping; typically, to the size reported by getrlimit(RLIMIT_STACK,). Because this essentially allocates the address space, but does not populate the pages yet with actual RAM until first accessed, the main cost is the kernel metadata (page tables and such).

It is possible that the C runtime provided by your compiler already does this (to the size reported by getrlimit(RLIMIT_STACK, )) as part of setting up the runtime environment for C (in crt*.o or libgcc*, for example). I haven't checked.

If one wants to, for example when creating a new thread, one can use mmap() (say, mmap((void *)0, size_in_bytes, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK | MAP_GROWSDOWN, -1, 0)) to allocate whatever stack one wants, then use pthread_attr_init() to initialize a thread attribute set, put the address and size of the stack into that thread attribute using pthread_attr_setstack(), and supply a pointer to that thread attribute set as the second parameter to pthread_create(). The created thread will then use that stack.

Modifying the currently used stack is much trickier, and is best done in the C runtime (in machine code, written in assembly) before the actual compiled C code is run in the process. In C, it can be done via getcontext()/setcontext(), by creating a new context (as if it was a new thread), setting up a new stack for it, switching to the new context, and then freeing the old stack.

In many cases, signal handlers are set to use a separate stack, by calling sigaltstack(). This is very useful, because then signals due to e.g. stack overflow can still be acted upon.

Finally, recall that in Linux, /proc/PID/maps describes all existing mappings for process PID. For the process itself, you can always use /proc/self/maps. You might find the following dump_maps() function useful when experimenting with this stuff:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

/* Returns 0 if memory mappings printed to standard output,
   an errno error code if an error occurs.
*/
int dump_maps(void)
{
    FILE *in;
    int   ch;

    in = fopen("/proc/self/maps", "r");
    if (!in) {
        const int saved_errno = errno;
        fprintf(stderr, "Cannot open /proc/self/maps: %s.\n", strerror(saved_errno));
        return errno = saved_errno;
    }

    printf("  MinAddress-MaxAddress  Perms Offset  Device   Inode                    Pathname-or-Description\n");

    /* Yes, this is the slowest possible way to copy a file to standard output,
       but it should not matter for this use case.  The KISS principle. */
    while ((ch = getc(in)) != EOF)
        putchar(ch);

    putchar('\n');

    fclose(in);
    return 0;
}

int main(void)
{
    dump_maps();

    return EXIT_SUCCESS;
}

For further info on /proc/self/maps and other /proc pseudofiles –– they're not files in the sense of existing on any storage device; they are generated by the kernel as they are accessed, and are a very efficient interface for this kind of stuff ––, see man 5 proc.

Upvotes: 5

stack growth and mmap stack?

Answers (1)

Related Questions