Reputation: 2462
I've attended a few Halide panels over the years at Siggraph and I finally decided to do some testing to determine if it would be useful to transcode my existing software. So far the results have been impressive.
I was writing a Gaussian Blur based on the code presented at Siggraph 2015 and ran into some weird behavior that I can't make sense of. I'm not sure if it is my own misunderstanding or some kind of bug/"feature".
See code below, note the empty loop. The gkernel and normalize are functions I've written to produce the Gaussian coefficients. When I compile and run the code with the loop commented out the output image is black (all zeros). When I leave the empty loop in the function executes much faster and the output image is correctly blurred.
Am I missing something fundamental or is this some sort of bug? I'm using MSVS Professional 2013 on Windows 7.
Function Code:
Func HalideGBlur(Func f){
float k[3];
gkernel(k);
normalize(k);
for (int i = 0; i < 1; i++){
;
}
Func ypass;
ypass(X, Y, C) = ( k[1] * f(X, Y, C) +
k[0] * (f(X, Y - 1, C) + f(X, Y + 1, C)) );
Func xpass;
xpass(X, Y, C) = ( k[1] * ypass(X, Y, C) +
k[0] * (ypass(X -1, Y, C) + ypass(X + 1, Y, C)) );
//scheduling for x and y passes
xpass.compute_root().vectorize(X, 8).parallel(Y);
ypass.compute_at(xpass, Y).vectorize(X, 8);
return xpass;
}
Relevant Execution code:
Func g = HalideGBlur(bounded_image);
htime = ocvtime = FLT_MAX;
cout << "\n****Testing Gaussian Blur****\n";
//Run Halide tests
for (int x = 0; x < 10; x++){
start_time = omp_get_wtime();
g.realize(output);
end = omp_get_wtime() - start_time;
if (end < htime){ htime = end; }
}
cout << "halide best: " << htime << "\n";
Results without the meaningless loop:
****Testing Gaussian Blur****
halide best: 0.0246554
ocv best: 0.0318704
Halide is 1.2926 times as fast as OpenCV.
Results with the meaningless loop:
****Testing Gaussian Blur****
halide best: 0.00749808
ocv best: 0.0317644
Halide is 4.2363 times as fast as OpenCV.
Upvotes: 3
Views: 185
Reputation: 1436
That's a puzzler. Maybe you have a memory-stomping bug and that loop is affecting stack frame layout. Is there a valgrind equivalent on Windows you can use to check for this?
Upvotes: 1