Reputation: 21
The purpose of my code is to execute two child processes and increment a shared variable counter. Each process should increment it by 1 million each. Here is my code:
#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
typedef struct
{
int value;
} shared_mem;
shared_mem *counter;
//these next two processes are the ones that increment the counter
process1()
{
int i=0;
for(i=0;i<1000000;i++)
counter->value++;
}
process2()
{
int i=0;
for(i=0;i<1000000;i++)
counter->value++;
}
/* The Main Body */
main()
{
key_t key = IPC_PRIVATE; /* shared memory key */
int shmid; /* shared memory ID */
shared_mem *shmat1;
int pid1; /* process id for child1 */
int pid2; /* process id for child2 */
/* attempts to attach to an existing memory segment */
if (( shmid = shmget(key, sizeof(int), IPC_CREAT | 0666)) < 0)
{
perror("shmget");
return(1);
}
/*attempts the shared memory segment */
if((counter = (shared_mem *)shmat(shmid, NULL, 0)) == (shared_mem *) -1)
{
perror("shmat");
return(1);
}
/*initializing shared memory to 0 */
counter->value = 0;
pid1=fork();
/* fork process one here */
if(pid1==0)
{
printf("I am child 1 with PID %d\n", getpid());
process1();
}
else
{
pid2=fork();
if(pid2==0)
{
printf("I am child 2 with PID %d\n", getpid());
process2();
}
else
{
wait(NULL);
printf("I am parent with PID %d\n", getpid());
printf("Total counter value is: %d\n", counter->value);
}
}
/*deallocate shared memory */
if(shmctl(shmid, IPC_RMID, (struct shmid_ds *)0)== -1)
{
perror("shmctl");
return(-1);
}
return(0);
}
The counter output is hovering around 1 million, but shouldn't it hover around 2 million? I think I am not understanding something about the way the processes increment. Thanks a lot in advance, and I apologize if the code is too long but I am not sure what I could have included and what I could have excluded.
Upvotes: 0
Views: 158
Reputation: 126787
Variable increment by itself is not atomic; unless instructed otherwise, the compiler can generate code that goes like:
load counter->value in a register
increment the register
move the incremented value back to counter->value
This is exactly the kind of code generated by gcc with optimizations disabled:
mov rax, QWORD PTR counter[rip] ; find out the address of counter->value
mov edx, DWORD PTR [rax] ; get its content in edx
add edx, 1 ; increment edx
mov DWORD PTR [rax], edx ; move it back to counter->value
(although you may have a race condition even if the generated assembly for the increment was just a single instruction - for example, even a straight inc DWORD PTR[rax]
on x86 is not atomic on a multi-core machine unless it has the lock
prefix.)
Now, if you have two threads that keep trying to increment the variable concurrently, most often you'll have a sequence of operations somewhat resembling this:
Thread A Thread B
load counter->value in a register
load counter->value in a register
increment the register
increment the register
move the register to counter->value
move the register to counter->value
Since both the increments happened in a separate register starting from the same value, the net result is that counter->value
will appear to be incremented just once, not twice (this is just a possible example, you can conceive many other possible sequences that may skip as many increments as you like - think thread 1 being suspended between the load and the store while the second thread keeps going for many iterations).
The solution is to use atomic operations to act upon the shared value; on gcc, you have several atomic builtins available, which expand to the correct assembly code that performs the described operation atomically, i.e. without risk of interleaving such as the one described above.
In this particular case, you should replace counter->value++
with something like __sync_add_and_fetch(&counter->value, 1)
. The generated code changes to
mov rax, QWORD PTR counter[rip] ; find out the address of counter->value
lock add DWORD PTR [rax], 1 ; atomically increment the pointed value
and you should see that the counter does reach 2000000 as expected.
Notice however that atomic operations are quite limited, since the CPU typically support such operations only on a restricted number of types (typically, integers smaller than the native word size) and only few primitives are available or easily assembled (say, atomic swap, atomic compare and swap, atomic increment and the like). For this reason, whenever you need to guarantee that some arbitrary code block is always executed atomically you have to resort to mutexes and other synchronization primitives (which are generally built over atomic operations).
Upvotes: 1