zhao
zhao

Reputation: 232

What happens if sem_init() is called twice?

The man page of sem_init() says "Initializing a semaphore that has already been initialized results in undefined behavior." Why is that and what exactly will happen on Linux?

This doesn't make sense to me, because when you call sem_init() for the first time, the (uninitialized) sem_t could have exact content as an initialized sem_t -- if the manual is correct, then sem_init() simply doesn't work.

Upvotes: 4

Views: 2959

Answers (2)

R.. GitHub STOP HELPING ICE
R.. GitHub STOP HELPING ICE

Reputation: 215261

On Linux, where semaphores are implemented without any system resources, sem_init just fills in the sem_t structure members, and so nothing bad will happen if it's called more than once. However, in general much worse things could happen.

If the sem_t is just a dummy object containing a pointer to an allocated object (note: this can't work for process-shared semaphores), you would leak memory by calling sem_init multiple times.

Similarly, if sem_t just contained a reference (like a file descriptor number) to a kernel-managed resource, you would leak these kernel resources by calling sem_init more than once.

Even worse, if the library implementation maintained a linked list of all instantiated semaphores using prev/next pointers inside the sem_t object (also not possible for process-shared case), you would corrupt this list by calling sem_init on a sem_t that's already part of the list.

The standard for POSIX semaphores allows a wide variety of implementation types that might be needed to support implementations on different types of systems (e.g. machines without atomic compare-and-swap instruction, bare-metal with no kernel, ...) so it leaves the behavior undefined so as not to impose requirements that might limit implementation choices.

Upvotes: 3

Alexander Gessler
Alexander Gessler

Reputation: 46607

Why is that

Think of it from an API designer's point of view. A semaphore can be seen as an abstract object that is created, used, and eventually disposed of.

Now the task is to map it to C (or any other language). The semaphore implementation will need to acquire resources, possibly resources that are maintained by the operating system. Above live cycle makes a lot of sense.

The API is finalized, and a first implementation is made. Many corner cases or extra requirements come up soon. For example whether sem_init can be called multiple times, given that the current implementation makes it trivial to allow it. Another one (maybe) is that it should be possible to select whether semaphores are shared between threads, or processes.

In each case, the API designer will have to weight the trade-offs:

  • Is it an extra burden to the implementors of the API?
  • Can it be achieved otherwise? (i.e. does it have to be implemented at a library/system level)
  • Does it distract from the APIs core functionality?
  • Is it considered idiomatic in the language of choice?
  • Is it considered a good pattern for APIs in general?
  • Is it easy to get wrong / does it impose a security risk?
  • ...

In this case, it seems allowing for double initialization would get a no by most of these criteria. So the decision is made to not allow it. It probably still works with your particular implementation, compiler, system or even the majority of implementations, compilers, systems.

How to convey that? Well, you call it undefined behaviour in the manual and everybody knows not to do it. People with good working intuition for the environment can easily guess what the behaviour might be. Only a fool would rely on it, though.

the (uninitialized) sem_t could have exact content as an initialized sem_t

That is true. However, let us say the sem_t holds a pointer to a piece of heap memory that sem_init allocates using malloc. It is perfectly possible for a randomly-non-initialized sem_t to have the exact same pointer value, but the resource it corresponds to would not exist.

Upvotes: 2

Related Questions