stepovr
stepovr

Reputation: 197

Extract data from image containing table grid using python

I have images such as the one attached below. I need to extract the data within the grid along with the tabular structure and transform it into a dataframe/csv.

enter image description here

I am using OCR to extract the text along with the coordinates but in order to extract the table structure I would like to extract the horizontal and vertical grid lines.

Is there a method in OpenCV to do that that would generalize well ?

So far the approaches I've come across are : 1. Hough Lines 2. Extracting Rectangular contours 3. Drawing vertical and horizonal contours

Upvotes: 0

Views: 4021

Answers (2)

Knight Forked
Knight Forked

Reputation: 1619

With all due respect to @Chrys Bltr, the solution in the link is a little overkill. Here's a simpler solution, so I think:

import numpy as np
import cv2
import matplotlib.pyplot as plt

img_rgb = cv2.imread('your/image')
img = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)

th = cv2.adaptiveThreshold(img,255, cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,3,3)

_, ctrs, _ = cv2.findContours(img,cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
im_h, im_w = img.shape
im_area = im_w * im_h
for ctr in ctrs:
    x, y, w, h = cv2.boundingRect(ctr)
    # Filter contours based on size
    if 0.01 * im_area < w * h < 0.1*im_area:
        cv2.rectangle(img_rgb, (x, y), (x+w, y+h), (0, 255, 0), 2)

plt.imshow(img_rgb, cmap='gray', vmin=0, vmax=255)

You can store the rectangle information in the filtering process above and then do the OCR within each individual rectangular area.

Upvotes: 1

Chrys Bltr
Chrys Bltr

Reputation: 78

You can define a grid structure and extract information from all separate area with openCV, check this article A Box detection algorithm for any image containing boxes

Everything is perfectly explained

Upvotes: 2

Related Questions