Deekshith Shetty
Deekshith Shetty

Reputation: 82

Removing lines from image

I am a beginner in OpenCV, I need to remove the horizontal and vertical lines in the image so that only the text remains ( The lines were causing trouble when extracting text in ocr ). I am trying to extract text from the Nutrient Fact Table. Can anyone help me?

Nutrient Fact Table

Upvotes: 0

Views: 1239

Answers (1)

Trés DuBiel
Trés DuBiel

Reputation: 550

This was an interesting question, so I gave it a shot. Below I will show you how to extract and remove horizontal and vertical lines. You could extrapolate from it. Also, for sake of saving time, I did not preprocess your image to crop out the background as one should, which is an avenue for improvement.

The result: result The code (edit: added vertical lines):

#include <iostream>
#include <opencv2/opencv.hpp>
using namespace std;
using namespace cv;
int main(int, char** argv)
{
    // Load the image
    Mat src = imread(argv[1]);
    // Check if image is loaded fine
    if(!src.data)
        cerr << "Problem loading image!!!" << endl;
    Mat gray;
    if (src.channels() == 3)
    {
        cvtColor(src, gray, CV_BGR2GRAY);
    }
    else
    {
        gray = src;
    }

    //inverse binary img
    Mat bw;
    //this will hold the result, image to be passed to OCR
    Mat fin;
    //I find OTSU binarization best for text.
    //Would perform better if background had been cropped out
    threshold(gray, bw, 0, 255, THRESH_BINARY_INV | THRESH_OTSU);
    threshold(gray, fin, 0, 255, THRESH_BINARY | THRESH_OTSU);
    imshow("binary", bw);
    Mat dst;
    Canny( fin, dst, 50, 200, 3 );
    Mat str = getStructuringElement(MORPH_RECT, Size(3,3));
    dilate(dst, dst, str, Point(-1, -1), 3);
    imshow("dilated_canny", dst);
    //bitwise_and w/ canny image helps w/ background noise
    bitwise_and(bw, dst, dst);
    imshow("and", dst);
    Mat horizontal = dst.clone();
    Mat vertical = dst.clone();
    fin = ~dst;

    //Image that will be horizontal lines
    Mat horizontal = bw.clone();
    //Selected this value arbitrarily
    int horizontalsize = horizontal.cols / 30;
    Mat horizontalStructure = getStructuringElement(MORPH_RECT, Size(horizontalsize,1));
    erode(horizontal, horizontal, horizontalStructure, Point(-1, -1));
    dilate(horizontal, horizontal, horizontalStructure, Point(-1, -1), 1);
    imshow("horizontal_lines", horizontal);

    //Need to find horizontal contours, so as to not damage letters
    vector<Vec4i> hierarchy;
    vector<vector<Point> >contours;
    findContours(horizontal, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_NONE);
    for (const auto& c : contours)
    {
        Rect r = boundingRect(c);

        float percentage_height = (float)r.height / (float)src.rows;
        float percentage_width = (float)r.width / (float)src.cols;

        //These exclude contours that probably are not dividing lines
        if (percentage_height > 0.05)
            continue;

        if (percentage_width < 0.50)
            continue;
        //fills in line with white rectange
        rectangle(fin, r, Scalar(255,255,255), CV_FILLED);
    }

    int verticalsize = vertical.rows / 30;
    Mat verticalStructure = getStructuringElement(MORPH_RECT, Size(1,verticalsize));
    erode(vertical, vertical, verticalStructure, Point(-1, -1));
    dilate(vertical, vertical, verticalStructure, Point(-1, -1), 1);
    imshow("verticalal", vertical); 

    findContours(vertical, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_NONE);
    for (const auto& c : contours)
    {
        Rect r = boundingRect(c);

        float percentage_height = (float)r.height / (float)src.rows;
        float percentage_width = (float)r.width / (float)src.cols;

        //These exclude contours that probably are not dividing lines
        if (percentage_width > 0.05)
            continue;

        if (percentage_height < 0.50)
            continue;
        //fills in line with white rectange
        rectangle(fin, r, Scalar(255,255,255), CV_FILLED);
    }

    imshow("Result", fin);
    waitKey(0);
    return 0;
}

The limitations of this approach are that the lines need to be straight. Due to the curve in the bottom line, it cuts slightly into "E" in "Energy". Perhaps with a hough line detection like suggested (I've never used it), a similar but more robust approach could be devised. Also, filling in the lines with rectangles probably is not the best approach.

Upvotes: 3

Related Questions