image analysis and detecting unanalyzed areas

Question

So here's a problem and question:

I'm analyzing a document page on HTML5 Canvas and detecting certain features, such as boxes, labels, text blocks, images, tables, etc. Because Canvas is slow for pixel read/write and the image needs to be high-res for good accuracy e.g.: 1500 x 2500, I cannot afford to analyze every pixel, let alone in multiple passes.

My algorithm does some random pixel pokes and does some minimal analysis to find if there is a usable bounding box for further processing and the type of processing that needs to be done; some parts may be sent to the server, like OCR.

Every subsequent random poke checks against a growing list of successfully found bounding boxes and pokes elsewhere until it gets into uncharted waters. The technique is surprisingly simple and effective, but this results in a lot of extra random pokes and does not provide consistent results without large poke counts (1% of area), and even then it misses some parts intermittently.

What would be great is to implement some spatial analysis algorithm that can tell me where the unpoked areas are outside of all bounding boxes, so that I can restrict my x/y random coordinate selection to there only. It should increase the efficacy and speed by a significant amount.

My typical box count for a fully analyzed doc page is < 200.

Does any algorithm exist in the public domain/wiki that can do this in JavaScript reasonably fast?

image analysis and detecting unanalyzed areas

Answers (1)

Related Questions