Joshua David
Joshua David

Reputation: 53

Segmentation fault when reading from a pointer to shared memory from shared memory in a child process

OVERVIEW

I have a program that needs to have shared state between several processes (probably 80, I'm working on an embarrassingly parallel problem on a server with 80 cores). Ideally, I would be able to allocate this shared memory in such a way that I could expand the amount of shared state between these processes.

My suspicion is that it's failing because the pointers don't point to actual memory, so if the return value of mmap in one process is 0xDEADBEEF, that doesn't mean that 0xDEADBEEF will point to the same section of memory in another process. However, I know next to nothing about C programming, so that suspicion could easily be wrong.

Could anyone tell me if my suspicion is correct? If so, what should I be doing for shared state instead? The server would take at least 18 days per dataset without using all the cores, and we have quite a few datasets, so giving up on parallel computing is not really an option. However, I am willing to switch from processes to threads or something similar if that would help (I don't know how one would do that in C though). Thanks in advance for your help.

Below is some working and non-working sample code, and the results from gdb.

borked.h

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/mman.h>

// Expandable array, doubles memory allocation when it runs out of space.
struct earr {
    int *vals;
    int capacity;
    int length;
};

void *shared_calloc(size_t nmemb, size_t size) {
    void *mem = mmap(NULL, nmemb * size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    return memset(mem, 0, nmemb * size);
}

void shared_free_size(void *mem, size_t nmemb, size_t size) {
    if(mem) munmap(mem, nmemb * size);
}

struct earr *create_earr() {
    struct earr *a = shared_calloc(1, sizeof(struct earr));
    a->length = 0;
    a->capacity = 16;
    a->vals = shared_calloc(a->capacity, sizeof(int));
    return a;
}

void earr_expand(struct earr *a) {
    int *new_vals = shared_calloc(a->capacity * 2, sizeof(int));
    memcpy(new_vals, a->vals, a->capacity * sizeof(int));
    a->vals = new_vals;
    a->capacity *= 2;
}

void earr_insert(struct earr *a, int val) {
    if(a->length >= a->capacity) earr_expand(a);
    a->vals[a->length] = val;
    a->length++;
}

int earr_lookup(struct earr *a, int index) {
    return a->vals[index];
}

working.c

#include "borked.h"

int main(void) {
    struct earr *a = create_earr();
    int i;
    pid_t pid;
    int size = 0x10000;
    for(i = 0; i < size; i++) {
        earr_insert(a, i);
    }
    for(i = 0; i < size; i++) {
        earr_lookup(a, i);
    }
    return EXIT_SUCCESS;
}

broken.c

#include "borked.h"

int main(void) {
    struct earr *a = create_earr();
    int i;
    pid_t pid;
    int size = 0x10000;
    if(0 == (pid = fork())) {
        for(i = 0; i < size; i++) {
            earr_insert(a, i);
        }
    } else {
        int status;
        waitpid(pid, &status, 0);
        for(i = 0; i < size; i++) {
            earr_lookup(a, i);
        }
    }
    return EXIT_SUCCESS;
}

GDB debugging

$ gdb broken
...
(gdb) run
Starting program /path/to/broken

Program received signal SIGSEGV, Segmentation Fault
0x08048663 in earr_lookup (a=0xb7fda000, index=0) at /path/to/borked.h:46
46      return a->vals[index];
(gdb) x/3x 0xb7fda000
0xb7fda000: 0xb7da6000  0x00010000  0x00010000
(gdb) x/x 0xb7da6000
0xb7da6000: Cannot access memory at address 0xb7da6000

Upvotes: 2

Views: 1769

Answers (2)

Nominal Animal
Nominal Animal

Reputation: 39366

Yes, the address of the shared memory varies from process to process.

The solution in general is to use indexes instead of pointers. In the OP's example code, one could use a structure with a C99 flexible array member to describe the shared memory map, something like

struct shared_mmap {
    unsigned int  size;
    unsigned int  used;
    int           data[];
};

The data element type itself can be a structure, of course.

However, growing an anonymous memory map with mremap() will give you process-local new pages; the new pages won't be shared between the processes. You need to use either a file backing (instead of anonymous shared memory), or better yet, POSIX shared memory.

(If you have sufficient amount of RAM available, the file backing is not necessarily slower, as the file pages will typically be in page cache. File backing has the interesting property of allowing a computation to be stopped and continued with same or even different number of processes, obviously depending on the shared memory map structure. If you use MAP_SHARED | MAP_NORESERVE, you can use a memory map much larger than available RAM + swap -- but if you run out of disk space or quota, your process will receive (and die from) a SIGBUS signal.)

Although on x86 and x86-64 int loads and stores are atomic, you'll most likely end up needing (pthread) mutexes and/or atomic built-ins (provided by GCC-4.7 and later; older GCC and ICC et. al. provide legacy sync built-ins.

Finally, if you have an unordered set of ints, I'd consider using a fixed-size (536,870,912 bytes, or 232 bits, in Linux) memory map instead, and atomic bitwise operators:

#include <limits.h>

#define UINT_BITS (CHAR_BIT * sizeof (unsigned int))
#define ULONG_BITS (CHAR_BIT * sizeof (unsigned long))

/* Pointer to shared memory of (1<<UINT_BITS)/CHAR_BIT chars */
static unsigned long *shared;

static inline unsigned int shared_get(unsigned int i)
{
    return !!__atomic_load_n(shared + (i / ULONG_BITS), 1UL << (i % ULONG_BITS), __ATOMIC_SEQ_CST);
}

static inline void shared_set(unsigned int i)
{
    __atomic_fetch_or(shared + (i / ULONG_BITS), 1UL << (i % ULONG_BITS), __ATOMIC_SEQ_CST);
}

static inline void shared_clear(unsigned int i)
{
    __atomic_fetch_nand(shared + (i / ULONG_BITS), 1UL << (i % ULONG_BITS), __ATOMIC_SEQ_CST);
}

In any case, the memory map length must be a multiple of sysconf(_SC_PAGESIZE), which you should evaluate at run time.

(It is a power of two on all known architectures, so reasonably large powers of two, currently 221 and larger, are multiples of the page size on all currently supported Linux architectures.)

Questions?

Upvotes: 0

nobody
nobody

Reputation: 20174

Your approach is indeed broken. When one of the child processes eventually calls earr_expand to increase the size of the array and ends up calling mmap, the resulting new mapping exists in that child process only - it does not get propagated to the parent process or other child processes.

You'll need to use a different technique, such as pre-allocating the full amount of memory before fork()ing, or using POSIX shared memory.

Upvotes: 3

Related Questions