Changseok Ma
Changseok Ma

Reputation: 89

How to offload to the GPU with OpenACC in Windows?

I am trying to use OpenACC in Windows. I am using GCC to compile. (with version 8.1.0)

I found a sample code online using OpenACC.

So using the command prompt, I typed as follows.

"C:\Users\chang>g++ -fopenacc -o C:\Users\chang\source\repos\Project18\Project18\testing.exe C:\Users\chang\source\repos\Project18\Project18\Source1.cpp"

And if I look at Performance in Task manager while the code is running, I don't see any change in GPU usage.

Also if I skip -fopenacc

"C:\Users\chang>g++ -o C:\Users\chang\source\repos\Project18\Project18\testing.exe C:\Users\chang\source\repos\Project18\Project18\Source1.cpp"

There is no difference in speed between with -fopenacc and without.

So I was wondering if there is a prerequisite before I use this OpenACC.

Below is the sample code I found.

Thanks in advance.

P.S As far as I remember, I haven't downloaded openacc.h and tried to find it online but couldn't find where it is. Is this can be a problem? I think since I could run exe file this doesn't seem like a problem but just in case.

    /*
 *  Copyright 2012 NVIDIA Corporation
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 *  Unless required by applicable law or agreed to in writing, software
 *  distributed under the License is distributed on an "AS IS" BASIS,
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *  See the License for the specific language governing permissions and
 *  limitations under the License.
 */
#include <iostream>
#include <math.h> 
#include <string.h>
#include <openacc.h>
#include <chrono>

#define NN 4096
#define NM 4096
using namespace std;
using namespace chrono;

double A[NN][NM];
double Anew[NN][NM];

int main(int argc, char** argv)
{
    const int n = NN;
    const int m = NM;
    const int iter_max = 1000;

    const double tol = 1.0e-6;
    double error = 1.0;

    memset(A, 0, n * m * sizeof(double));
    memset(Anew, 0, n * m * sizeof(double));

    for (int j = 0; j < n; j++)
    {
        A[j][0] = 1.0;
        Anew[j][0] = 1.0;
    }

    printf("Jacobi relaxation Calculation: %d x %d mesh\n", n, m);

    system_clock::time_point start = system_clock::now();
    int iter = 0;
    #pragma acc data copy(A), create(Anew)
    while (error > tol && iter < iter_max)
    {
        error = 0.0;

        #pragma acc kernels
        for (int j = 1; j < n - 1; j++)
        {
            for (int i = 1; i < m - 1; i++)
            {
                Anew[j][i] = 0.25 * (A[j][i + 1] + A[j][i - 1]
                    + A[j - 1][i] + A[j + 1][i]);
                error = fmax(error, fabs(Anew[j][i] - A[j][i]));
            }
        }

        #pragma acc kernels
        for (int j = 1; j < n - 1; j++)
        {
            for (int i = 1; i < m - 1; i++)
            {
                A[j][i] = Anew[j][i];
            }
        }

        if (iter % 100 == 0) printf("%5d, %0.6f\n", iter, error);

        iter++;
    }

    system_clock::time_point end = system_clock::now();
    std::chrono::duration<float> sec = end - start;
    cout << sec.count() << endl;
}

Upvotes: 0

Views: 426

Answers (1)

tschwinge
tschwinge

Reputation: 356

At this time, GCC doesn't support GPU code offloading on Windows. See https://stackoverflow.com/a/59376314/664214, or http://mid.mail-archive.com/[email protected], for example. It's certainly possible to implement, but somebody needs to do it, or pay for the work.

Upvotes: 1

Related Questions