Problems running program using shared memory; seg fault sometimes; shmmax and shmall have something to do with it?

Question

HI,

I have a program in which a master processes spawns N workers who will invert, each one, each row of an image, giving me an inverted image at the end. The program uses shared memory and posix semaphores, unnamed sems, more spefically and I use shmctl with IPC_RMID and sem_close and sem_destroy in the terminate() function. However, when I run the program several times, sometimes it gives me a segmentation fault and is in the first shmget. I've already modified my shmmax value in the kernel, but I can't do the same to the shmall value, I don't know why.

Can someone please help me? Why this happens and why isn't it all the time? The code seems fine, gives me what I want, efficient and so...but sometimes I have to reboot Ubuntu to be able to run it again, even thought I'me freeing the resources.

Please enlighten me!

EDIT:

Here are the 3 files needed to run the code + the makefile:
http://pastebin.com/JqTkEkPv
http://pastebin.com/v7fQXyjs
http://pastebin.com/NbYFAGYq

http://pastebin.com/mbPg1QJm

You have to run it like this ./invert someimage.ppm outimage.ppm (test with a small one for now please)

Here are some values that may be important:

$ipcs -lm
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 262144
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1

$ipcs -ls

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

EDIT: the seg fault was solved! I was allocating an **array in shared memory and that was a little bit odd.So, I've allocated segment for an *array only and voilà. If you want, check the new code and comment.

Jens Gustedt · Accepted Answer

Now that you posted your code we can say a bit more.

Without having read all in detail, I think the cleanup phase of your main looks suspicious. In fact it seems to me that all your worker processes will perform that cleanup phase too.

After the fork you should more clearly distinguish what main does and what the workers do. Alternatives:

Your main process could just wait on the pids of the workers and only then do the rest of processing and cleanup.
All the worker processes could return in main after the call to worker.
Call exit at the end of the worker function.

Edit after your code update:

I think still a better solution would be to do a classical wait for all the processes.

Now let's look into your worker process. In fact these never terminate, there is no break statement in the while (1) loop. I think what is happening is that once there is no more work to be done

the worker is stuck in sem_wait(sem_remaining_lines)
your main process gets notified of the termination
it destroys the sem_remaining_lines
the worker returns from sem_wait and continues
since mutex3 is also already destroyed (or maybe even unmapped) the wait on it returns immediately
now it tries to access the data, and depending on how far the main process got on destruction the data is mapped or not and the worker crashes (or not)

As you can see you have many problems in there. What I would do to clean up this mess is

waitpid before destroy the shared data
sem_trywait instead of the 1 in while (1). But perhaps I didn't completely understand your control flow. In any case, give them a termination condition.
capture all returns from system functions, in particular the sem_t family. These can be interrupted by IO, so you definitively must check for EINTR on these.

Problems running program using shared memory; seg fault sometimes; shmmax and shmall have something to do with it?

Answers (2)

Related Questions