Richard Schulze
Richard Schulze

Reputation: 113

OpenCL memory consistency

I have a question concerning the OpenCL memory consistency model. Consider the following kernel:

__kernel foo() {
    __local lmem[1];
    lmem[0]  = 1;
    lmem[0] += 2;
}

In this case, is any synchronization or memory fence necessary to ensure that lmem[0] == 3?

According to section 3.3.1 of the OpenCL specification,

within a work-item memory has load / store consistency.

To me, this says that the assignment will always be executed before the increment.

However, section 6.12.9 defines the mem_fence function as follows:

Orders loads and stores of a work-item executing a kernel. This means that loads and stores preceding the mem_fence will be committed to memory before any loads and stores following the mem_fence.

Doesn't this contradict section 3.3.1? Or maybe my understanding of load / store consistency is wrong? I would appreciate your help.

Upvotes: 1

Views: 214

Answers (1)

pmdj
pmdj

Reputation: 23428

As long as only one work-item performs read/write access to a local memory cell, that work-item has a consistent view of it. Committing to memory using a barrier is only necessary to propagate writes to other work-items in the work-group. For example, an OpenCL implementation would be permitted to keep any changes to local memory in private registers until a barrier is encountered. Within the work-item, everything would appear fine, but other work-items would never see these changes. This is how the phrase "committed to memory" should be interpreted in 6.12.9.

Essentially, the interaction between local memory and barriers boils down to this:

Between barriers:

  1. Only one work-item is allowed read/write access to a local memory cell.
    OR
  2. Any number of work-items in a work-group is allowed read-only access to a local memory cell.

In other words, no work-item may read or write to a local memory cell which is written to by another work-item after the last barrier.

Upvotes: 1

Related Questions