Gaurav Pathak
Gaurav Pathak

Reputation: 1143

Why gcc isn't optimizing the global variable?

I am trying to understand the behavior of volatile and compiler optimization in C through an example.

For this, I referred:

Where to use volatile?

Why is volatile needed in C?

https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming

All of the above posts have at least one answer related to signal handler so for this, I have written a simple code to actually implement and observe the behavior in Linux just for understanding.

#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <pthread.h>

int counter = 0;

void *thread0_func(void *arg)
{
    printf("Thread 0\n");
    while(1)
    {

    }
    return NULL;
}

void *thread1_func(void *arg)
{
    printf("Thread 1\n");
    while(counter == 0)
    {
        printf("Counter: %d\n", counter);
        usleep(90000);
    }
    return NULL;
}

void action_handler(int sig_no)
{
    printf("SigINT Generated: %d\n",counter);
    counter += 1;
}

int main(int argc, char **argv)
{
    pthread_t thread_id[2];

    struct sigaction sa;

    sa.sa_handler = action_handler;

    if(sigaction(SIGINT, &sa, NULL))
        perror("Cannot Install Sig handler");


    if(pthread_create(&thread_id[0], NULL, thread0_func, NULL))
    {
        perror("Error Creating Thread 0");
    }
    if(pthread_create(&thread_id[1], NULL, thread1_func, NULL))
    {
        perror("Error Creating Thread 0");
    }
    else
    {

    }
    while(1)
    {
        if(counter >= 5)
        {
            printf("Value of Counter is more than five\n");
        }
        usleep(90000);
    }
    return (0);
}

This code is just for learning and understanding.

I tried compiling the code using:
gcc -O3 main.c -o main -pthread

But the compiler is not acting on global variable counter and is not optimizing it.
I was expecting *thread1_func to execute in a forever loop and the if (counter >= 5) to be never true.

What am I missing here?

GCC Version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)

Upvotes: 1

Views: 1279

Answers (1)

Petr Skocik
Petr Skocik

Reputation: 60056

Your if tests on the value of counter are interspersed with calls to usleep and printf. These are opaque library calls. The compiler cannot see through them and so it has to assume they may have access to the counter external variable, and so it has to reload the counter variable after those calls.

If you move these calls out, the code gets optimized as you expect:

#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <pthread.h>

int counter = 0;

void *thread0_func(void *arg)
{
    printf("Thread 0\n");
    while(1)
    {

    }
    return NULL;
}

void *thread1_func(void *arg)
{
    printf("Thread 1\n");
    unsigned i=0;
    while(counter == 0)
    {
       i++;
    }
    printf("Thread 1: %d, i=%u\n", counter, i);
    return NULL;
}

void action_handler(int sig_no)
{
    printf("SigINT Generated: %d\n",counter);
    counter += 1;
}

int main(int argc, char **argv)
{
    pthread_t thread_id[2];

    struct sigaction sa;

    sa.sa_handler = action_handler;

    if(sigaction(SIGINT, &sa, NULL))
        perror("Cannot Install Sig handler");


    if(pthread_create(&thread_id[0], NULL, thread0_func, NULL))
    {
        perror("Error Creating Thread 0");
    }
    if(pthread_create(&thread_id[1], NULL, thread1_func, NULL))
    {
        perror("Error Creating Thread 0");
    }
    else
    {

    }
    while(1)
    {
        if(counter >= 5)
        {
            printf("Value of Counter is more than five\n");
        }
        usleep(90000);
    }
    return (0);
}

Even if you make the counter variable static, the compiler will still not optimize, because although an external library definitely won't see the counter variable, the external call may theoretically have a mutex lock, which would allow another thread to change the variable without a data race. Now neither usleep nor printf are wrappers around a mutex lock, but the compiler doesn't know, nor does it do inter-thread optimization, so it has to be conservative and reload the counter variable after the call and the reload is what prevents the optimization you expect.

Of course, a simple explanation would be that your program is undefined if the signal handler executes, because you should've made counter volatile sig_atomic_t and you should've have synced your inter-thread access to it with either _Atomic or a mutex -- and in an undefined program, anything is possible.

Upvotes: 4

Related Questions