BHawk
BHawk

Reputation: 2462

Unexpected behavior: Empty loop causes improved results in blur function

I've attended a few Halide panels over the years at Siggraph and I finally decided to do some testing to determine if it would be useful to transcode my existing software. So far the results have been impressive.

I was writing a Gaussian Blur based on the code presented at Siggraph 2015 and ran into some weird behavior that I can't make sense of. I'm not sure if it is my own misunderstanding or some kind of bug/"feature".

See code below, note the empty loop. The gkernel and normalize are functions I've written to produce the Gaussian coefficients. When I compile and run the code with the loop commented out the output image is black (all zeros). When I leave the empty loop in the function executes much faster and the output image is correctly blurred.

Am I missing something fundamental or is this some sort of bug? I'm using MSVS Professional 2013 on Windows 7.

Function Code:

Func HalideGBlur(Func f){
    float k[3];
    gkernel(k);
    normalize(k);

    for (int i = 0; i < 1; i++){
        ;
    }

    Func ypass;
    ypass(X, Y, C) = ( k[1] * f(X, Y, C) +
                       k[0] * (f(X, Y - 1, C) + f(X, Y + 1, C)) );
    Func xpass;
    xpass(X, Y, C) = ( k[1] * ypass(X, Y, C) +
                       k[0] * (ypass(X -1, Y, C) + ypass(X + 1, Y, C)) );

    //scheduling for x and y passes
    xpass.compute_root().vectorize(X, 8).parallel(Y);
    ypass.compute_at(xpass, Y).vectorize(X, 8);
    return xpass;
}

Relevant Execution code:

Func g = HalideGBlur(bounded_image);

htime = ocvtime = FLT_MAX;
cout << "\n****Testing Gaussian Blur****\n";
//Run Halide tests
for (int x = 0; x < 10; x++){
    start_time = omp_get_wtime();
    g.realize(output);
    end = omp_get_wtime() - start_time;
    if (end < htime){ htime = end; }
}
cout << "halide best: " << htime << "\n";

Results without the meaningless loop:

****Testing Gaussian Blur****
halide best: 0.0246554
ocv best: 0.0318704
Halide is 1.2926 times as fast as OpenCV.

Results with the meaningless loop:

****Testing Gaussian Blur****
halide best: 0.00749808
ocv best: 0.0317644
Halide is 4.2363 times as fast as OpenCV.

Upvotes: 3

Views: 185

Answers (1)

Andrew Adams
Andrew Adams

Reputation: 1436

That's a puzzler. Maybe you have a memory-stomping bug and that loop is affecting stack frame layout. Is there a valgrind equivalent on Windows you can use to check for this?

Upvotes: 1

Related Questions