vikky.rk
vikky.rk

Reputation: 4149

Concurrency: Cache Coherence Issue or Compiler Optimization?

From my understanding, if Hardware supports Cache coherence on a multi-processor system, then writes to a shared variable will be visible to threads running on other processors. In order to test this, I wrote a simple program in Java and pThreads to test this

public class mainTest {

    public static int i=1, j = 0;
    public static void main(String[] args) {

    /*
     * Thread1: Sleeps for 30ms and then sets i to 1
     */
    (new Thread(){
        public void run(){
            synchronized (this) {
                try{
                       Thread.sleep(30);
                       System.out.println("Thread1: j=" + mainTest.j);
                       mainTest.i=0;
                   }catch(Exception e){
                       throw new RuntimeException("Thread1 Error");
                }
            }
        }
    }).start();

    /*
     * Thread2: Loops until i=1 and then exits.
     */
    (new Thread(){
        public void run(){
            synchronized (this) {
                while(mainTest.i==1){
                    //System.out.println("Thread2: i = " + i); Comment1
                    mainTest.j++;
                }
                System.out.println("\nThread2: i!=1, j=" + j);
            }
        }
    }).start();

   /*
    *  Sleep the main thread for 30 seconds, instead of using join. 
    */
    Thread.sleep(30000);
    }
}




/* pThreads */

#include<stdio.h>
#include<pthread.h>
#include<assert.h>
#include<time.h>

int i = 1, j = 0;

void * threadFunc1(void * args) {
    sleep(1);
    printf("Thread1: j = %d\n",j);
    i = 0;
}

void * threadFunc2(void * args) {
while(i == 1) {
        //printf("Thread2: i = %d\n", i);
        j++;
    }
}

int main() {
    pthread_t t1, t2;
    int res;
    printf("Main: creating threads\n");

    res = pthread_create(&t1, NULL, threadFunc1, "Thread1"); assert(res==0);
    res = pthread_create(&t2, NULL, threadFunc2, "Thread2"); assert(res==0);

    res = pthread_join(t1,NULL); assert(res==0);
    res = pthread_join(t2,NULL); assert(res==0);

    printf("i = %d\n", i);
    printf("Main: End\n");
    return 0;
}    

I noticed that the pThread program always ends. (I tested it for different sleep times for thread1). However the Java program ends only a very few times; does not end most of the times. If I uncomment the Comment1 in java program, then it ends all the time. Also if I use volatile, then it ends for java in all cases.

So my confusion is,

  1. if cache coherence is done in hardware, then 'i=0' should be visible to other threads unless compiler optimized the code. But if compiler optimized the code, then I don't understand why the thread ends sometimes and doesn't sometimes. Also adding a System.out.println seems to change the behavior.

  2. Can anyone see a compiler optimization that Java does (which is not done by C compiler), which is causing this behavior?

  3. Is there something additional that the Compiler has to do, to get Cache coherence even if the hardware already supports it? (like enable/disable)

  4. Should I be using Volatile for all shared variables by default?

Am I missing something? Any additional comments are welcome.

Upvotes: 1

Views: 591

Answers (5)

Gray
Gray

Reputation: 116878

Your specific problem is that the 2nd thread needs to synchronize memory after i has been set to 0 by the 1st thread. Because both the threads are synchronizing on this which, as @Peter and @Marko has pointed out are different objects. It is possible for the 2nd thread to enter the while loop _before the first thread sets i = 0. There is no additional memory barrier crossed in the while loop so the field is never updated.

If I uncomment the Comment1 in java program, then it ends all the time.

This works is because the underlying System.out PrintStream is synchronized which causes a memory-barrier to be crossed. Memory barriers force synchronization memory between the thread and central memory and ensure ordering of memory operations. Here's the PrintStream.println(...) source:

public void println(String x) {
    synchronized (this) {
        print(x);
        newLine();
    }
}

if cache coherence is done in hardware, then 'i=0' should be visible to other threads unless compiler optimized the code

You have to remember that each of the processors has both a few registers and a lot of per-processor cache memory. It is the cached memory which is the main issue here not compiler optimizations.

Can anyone see a compiler optimization that Java does (which is not done by C compiler), which is causing this behavior?

The use of cached memory and memory operation reordering both are significant performance optimizations. Processors are free to change the order of operations to improve pipelining and they do not synchronize their dirty pages unless a memory barrier is crossed. This means that a thread can run asynchronously using local high-speed memory to [significantly] increase performance. The Java memory model allows for this and is vastly more complicated compared to pthreads.

Should I be using volatile for all shared variables by default?

If you expect thread #1 to update a field and thread #2 to see that update then yes, you will need to mark the field as volatile. Using Atomic* classes is often recommended and is required if you want to increment a shared variable (++ is two operations).

If you are doing multiple operations (such as iterating across a shared collection) then synchronized keyword should be used.

Upvotes: 3

Hong Zhou
Hong Zhou

Reputation: 649

If the expected behavior is for thread 2 to detect the change in variable and terminate, definately "Volatile" keyword is required. It allows the thead to be able to communicate via the volatile variable. Compiler usually optimize to fetch from cache as it is faster compared to fetching from main memory.

Check out this awesome post, it will give you your answer: http://jeremymanson.blogspot.sg/2008/11/what-volatile-means-in-java.html

I believe in this case, it has nothing to do with cache coherence. As mentioned it is a computer architecture features, which should be transparent to a c/java program. If no volatile is specified, the behaviour is undefined and that's why sometimes the other thread can get the value change and sometimes it can't.

volatile in C and java context has different meaning. http://en.wikipedia.org/wiki/Volatile_variable

Depending on your C compiler, the program might get optimized and have the same effect as your java program. So a volatile keyword is always recommended.

Upvotes: 1

MSN
MSN

Reputation: 54604

Cache coherency is a hardware level feature. How manipulating a variable maps to CPU instructions and indirectly to the hardware is a language/runtime feature.

In other words, setting a variable does not necessarily translate into CPU instructions that write to that variable's memory. A compiler (offline or JIT) can use other information to determine that it does not need to be written to memory.

Having said that, most languages with support for concurrency have additional syntax to tell the compiler that the data you are working with is intended for concurrent access. For many (like Java), it's opt-in.

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533520

if cache coherence is done in hardware, then 'i=0' should be visible to other threads unless compiler optimized the code. But if compiler optimized the code, then I don't understand why the thread ends sometimes and doesn't sometimes. Also adding a System.out.println seems to change the behavior.

Note: The javac does next to no optimization, so don't think in terms of static optimisations.

You are locking on different objects which are unrelated to the object you are modifying. As the field you are modifying is not volatile the JVM optimiser is free to optimise it dynamically as it chooses, regardless of the support your hardware could otherwise provide.

As this is dynamic, it may or may not optimise the read of the field which you don't change in that thread.

Can anyone see a compiler optimization that Java does (which is not done by C compiler), which is causing this behavior?

The optimisation is most likely that the read is cached in a register or the code is eliminated completely. This optimisation typically takes about 10-30 ms so you are testing whether this optimisation has occurred before the program finishes.

Is there something additional that the Compiler has to do, to get Cache coherence even if the hardware already supports it? (like enable/disable)

You have to use the model correctly, forget about the idea that the compiler will optimise your code, and ideally use the concurrency libraries for passing work between threads.

public static void main(String... args) {
    final AtomicBoolean flag = new AtomicBoolean(true);
    /*
    * Thread1: Sleeps for 30ms and then sets i to 1
    */
    new Thread(new Runnable() {
        @Override
        public void run() {
            try {
                Thread.sleep(30);
                System.out.println("Thread1: flag=" + flag);
                flag.set(false);
            } catch (Exception e) {
                throw new RuntimeException("Thread1 Error");
            }
        }
    }).start();

    /*
    * Thread2: Loops until flag is false and then exits.
    */
    new Thread(new Runnable() {
        @Override
        public void run() {
            long j = 0;
            while (flag.get())
                j++;
            System.out.println("\nThread2: flag=" + flag + ", j=" + j);
        }
    }).start();
}

prints

Thread1: flag=true

Thread2: flag=false, j=39661265

Should I be using Volatile for all shared variables by default?

Almost never. It would work if you have a since flag if you set it only once. However, using locking is more likely to be useful generally.

Upvotes: 5

Marko Topolnik
Marko Topolnik

Reputation: 200158

The program will end if Thread 2 starts running after Thread 1 has already set i to 0. Using synchronized(this) may contribute to this somewhat because there's a memory barrier at each entry into a synchronized block, regardless of the lock acquired (you use disparate locks, so no contention will ensue).

Aside from this there may be other complicated interactions between the moment your code gets JITted and the moment Thread 1 writes 0, since this changes the level of optimization. Optimized code will normally read only once from the global var and cache the value in a register or similar thread-local location.

Upvotes: 1

Related Questions