Reputation: 35958
I am using opencv_contrib to detect textual regions in an image.
This is the original image
This is the image after textual regions are found:
As can be seen, there are overlapping groups in the image. For example, there seem to be two groups around Hello World
and two around Some more sample text
Question
In scenarios like these how can I keep the widest possible box by merging the two boxes. For these examples that would be one starting with H
and ending in d
so that it covers Hello World
. My reason for doing is is that I would like to crop part of this image and send it to tesseract.
Here is the relevant code that draws the boxes.
void groups_draw(Mat &src, vector<Rect> &groups)
{
for (int i=(int)groups.size()-1; i>=0; i--)
{
if (src.type() == CV_8UC3)
rectangle(src,groups.at(i).tl(),groups.at(i).br(),Scalar( 0, 255, 255 ), 2, 8 );
}
}
Here is what I've tried. My ideas are in comments.
void groups_draw(Mat &src, vector<Rect> &groups)
{
int previous_tl_x = 0;
int previous_tl_y = 0;
int prevoius_br_x = 0;
int previous_br_y = 0;
//sort the groups from lowest to largest.
for (int i=(int)groups.size()-1; i>=0; i--)
{
//if previous_tl_x is smaller than current_tl_x then keep the current one.
//if previous_br_x is smaller than current_br_x then keep the current one.
if (src.type() == CV_8UC3) {
//crop the image
Mat cropedImage = src(Rect(Point(groups.at(i).tl().x, groups.at(i).tl().y),Point(groups.at(i).br().x, groups.at(i).br().y)));
imshow("cropped",cropedImage);
waitKey(-1);
}
}
}
Update
I'm trying to use [groupRectangles][4]
to accomplish this:
void groups_draw(Mat &src, vector<Rect> &groups)
{
vector<Rect> rects;
for (int i=(int)groups.size()-1; i>=0; i--)
{
rects.push_back(groups.at(i));
}
groupRectangles(rects, 1, 0.2);
}
However, this is giving me an error:
textdetection.cpp:106:5: error: use of undeclared identifier 'groupRectangles'
groupRectangles(rects, 1, 0.2);
^
1 error generated.
Upvotes: 5
Views: 325
Reputation: 1526
First, the reason you get overlapping bounding boxes is that the text detector module is working on inverted channels (e.g: gray and inverted gray) and because of that the inner regions of some characters such as o's and g's are wrongly detected and grouped as characters. So if you want to detect only one mode of text (white text on dark background) just pass the inverted channels. Replace:
for (int c = 0; c < cn-1; c++)
channels.push_back(255-channels[c]);
With:
for (int c = 0; c < cn-1; c++)
channels[c] = (255-channels[c]);
Now for your question, rectangles have defined intersection and combining operators:
rect = rect1 & rect2 (rectangle intersection)
rect = rect1 | rect2 (minimum area rectangle containing rect2 and rect3 )
rect &= rect1, rect |= rect1 (and the corresponding augmenting operations)
You can use those operators while iterating over rectangles to detect intersected rectangles and combine them, as follows:
if ((rect1 & rect2).area() != 0)
rect1 |= rect2;
Edit:
First, sort rectangle groups by area from largest to smallest:
std::sort(groups.begin(), groups.end(),
[](const cv::Rect &rect1, const cv::Rect &rect2) -> bool {return rect1.area() > rect2.area();});
Then, iterate over the rectangles, when two rectangles intersect add the smaller to the larger and then delete it:
for (int i = 0; i < groups.size(); i++)
{
for (int j = i + 1; j < groups.size(); j++)
{
if ((groups[i] & groups[j]).area() != 0)
{
groups[i] |= groups[j];
groups.erase(groups.begin() + j--);
}
}
}
Upvotes: 3
Reputation: 32742
One approach would be to compare every rectangle with every other rectangle to see if they overlap or intersect. If they do in a sufficient amount you can combine them into one larger rectangle.
Upvotes: 0