puneet336
puneet336

Reputation: 461

Difference between openMP's target and target data?

The target construct offloads a code region from host to a target device. The variables p,v1,v2, are explicitly mapped to the the target device using the map clause. target data also does the same ,

Then what is implied by :

in regard to the "target data" construct,

I mean what differences are there in offloading mechanism amongst these codes :

void vec_mult1(float *p, float *v1, float *v2, int N)
{
    int i;
    init(v1, v2, N);
#pragma omp target map(to: v1[0:N], v2[:N]) map(from: p[0:N])
#pragma omp parallel for
    for (i=0; i<N; i++)
        p[i] = v1[i] * v2[i];
    output(p, N);
}


void vec_mult2(float *p, float *v1, float *v2, int N)
{
    int i;
    init(v1, v2, N);
#pragma omp target device(mic0) data map(to: v1[0:N], v2[:N]) map(from: p[0:N])
    {
    //this code runs on accelerator card
#pragma omp target //if we omit it what difference will it make ? 
#pragma omp parallel for
        for (i=0; i<N; i++)
            p[i] = v1[i] * v2[i];
    }
    output(p, N);
}

void vec_mult3(float *p, float *v1, float *v2, int N)
{
    int i;
    init(v1, v2, N);
#pragma omp target data map(to: v1[0:N], v2[:N]) map(from: p[0:N])
    {

        //target construct omitted
#pragma omp parallel for
        for (i=0; i<N; i++)
            p[i] = v1[i] * v2[i];
    }
    output(p, N);
}

I tried to execute them but I'm unable to notice significant differences between them.

Upvotes: 6

Views: 4060

Answers (2)

Hristo Iliev
Hristo Iliev

Reputation: 74455

The target data construct only creates a device data environment that lasts for the extent of the region. It only sets the mapping between the variables in the device data environment and the data environment of the encountering task. The rationale behind having a separate construct is that in many cases it is desirable that certain data remains on the device instead of constantly being transferred to and from it.

Imagine the following very artificial example:

int data[N];

#pragma omp target
#pragma omp for
for (int i = 0; i < N; i++)
   data[i] *= 2;

// Do something else

#pragma omp target
#pragma omp for
for (int i = 0; i < N; i++)
   data[i] += 5;

Now in that case, the two target constructs also create two data environments. The data variable is automatically mapped as tofrom. That means the following set of actions takes place:

  1. data is copied to the device
  2. The first loop runs on the device
  3. data is copied from the device
  4. The host executes // Do something else
  5. data is copied to the device
  6. The second loop runs on the device
  7. data is copied from the device

Now imagine that // Do something else reads data but never modifies it. That makes the transfer of data to the device in step 5 redundant - it could just be retained in the state it has after step 2. Here is where the target data construct comes into play. It allows you to create a data environment that spans more than the extent of the target construct. The example above can then be rewritten so:

int data[N];

#pragma omp target data map(tofrom: data)
{
   #pragma omp target
   #pragma omp for
   for (int i = 0; i < N; i++)
      data[i] *= 2;

   #pragma omp target update from(data)

   // Do something else

   #pragma omp target
   #pragma omp for
   for (int i = 0; i < N; i++)
      data[i] += 5;
}

The target constructs in that case do not create new device data environments but rather utilise the one created by the target data construct (actually they do create new data environments, but those are merged with the one from the target data and they do not contain any new references). So the sequence of operations is:

  1. data is copied to the device
  2. The first loop runs on the device
  3. data is explicitly copied from the device
  4. The host executes // Do something else
  5. The second loop runs on the device
  6. data is copied from the device

Since data is needed in // Do something else but it is only automatically transferred from the device at the end of the target data construct, the explicit target update is used to copy it into the encountering task's data environment at step 3.

Now this is just a small and very artificial example but in real life saving on unnecessary data transfers could significantly improve the performance of OpenMP applications that offload computations to co-processors and/or accelerators.

Upvotes: 9

Ilya Verbin
Ilya Verbin

Reputation: 695

#pragma omp target data only maps the variables to target device, but it doesn't execute any code on target device. #pragma omp target maps the variables and executes the target region on target device.

So, in your example:

  • vec_mult1 and vec_mult2 will execute the loop on target;
  • vec_mult3 will map variables to target but execute the loop on host.

Upvotes: 2

Related Questions