Alex Taylor
Alex Taylor

Reputation: 1412

Integral image box filtering

I'm trying to compare the performance in Halide of a 2-pass, separable approach to the integral-image-based box filtering approach to gain a better understanding of Halide scheduling. I cannot find any examples of Integral Image creation in Halide where the Integral Image Function is used in the defintion of a subsequent function.

ImageParam input(type_of<uint8_t>(), 3, "image 1");

Var x("x"), y("y"), c("c"), xi("xi"), yi("yi");

Func ip("ip");

ip(x, y, c) = cast<float>(BoundaryConditions::repeat_edge(input)(x, y, c));

Param<int> radius("radius", 15, 1, 50);

RDom imageDomain(input);
RDom r(-radius, radius, -radius, radius);

// Make an integral image
Func integralImage = ip;
integralImage(x, imageDomain.y, c) += integralImage(x, imageDomain.y - 1, c);
integralImage(imageDomain.x, y, c) += integralImage(imageDomain.x - 1, y, c);

integralImage.compute_root(); // Come up with a better schedule for this

// Apply box filter to integral image
Func outputImage;

outputImage(x,y,c) = integralImage(x+radius,y+radius,c)
                + integralImage(x-radius,y-radius,c)
                - integralImage(x-radius,y+radius,c)
                - integralImage(x-radius,y+radius,c);

Expr normFactor = (2*radius+1) * (2*radius+1);
outputImage(x,y,c) = outputImage(x,y,c) / normFactor;
result(x,y,c) = cast<uint8_t>(outputImage(x,y,c));

result.parallel(y).vectorize(x,8)

I did find the following code in the tests:

https://github.com/halide/Halide/blob/master/test/correctness/multi_pass_reduction.cpp

But this example uses realize to compute the integral image as a buffer over a fixed domain and doesn't consume the definition of integral image as a function in the definition of a subsequent function.

When I run this code, I observe that:

  1. The computation of the Integral Image is extremely slow (moves my pipeline to 0 fps)
  2. I get an incorrect answer. I feel like I must be somehow misdefining my integral image

I also have a related question, how would one best schedule the computation of an integral image in this type of scenario in Halide?

Upvotes: 1

Views: 1061

Answers (1)

Alex Taylor
Alex Taylor

Reputation: 1412

My problem was in the definition of my integral image. If I change my implementation to the standard one pass definition of the integral image, I get expected behavior:

Func integralImage;
integralImage(x,y,c) = 0.0f; // Pure definition
integralImage(intImDom.x,intImDom.y,c) = ip(intImDom.x,intImDom.y,c)
                  + integralImage(intImDom.x-1,intImDom.y,c)
                  + integralImage(intImDom.x,intImDom.y-1,c)
                  - integralImage(intImDom.x-1,intImDom.y-1,c);

integralImage.compute_root();

I still have remaining questions about the most efficient algorithm/schedule in Halide for computing an integral image, but I'll re-post that as a more specific question, as my current post was kind of open-ended.

As an aside, there is a second problem in the code above in that padding of the input image is not handled correctly.

Upvotes: 1

Related Questions