Reputation: 197
I have images such as the one attached below. I need to extract the data within the grid along with the tabular structure and transform it into a dataframe/csv.
I am using OCR to extract the text along with the coordinates but in order to extract the table structure I would like to extract the horizontal and vertical grid lines.
Is there a method in OpenCV to do that that would generalize well ?
So far the approaches I've come across are : 1. Hough Lines 2. Extracting Rectangular contours 3. Drawing vertical and horizonal contours
Upvotes: 0
Views: 4021
Reputation: 1619
With all due respect to @Chrys Bltr, the solution in the link is a little overkill. Here's a simpler solution, so I think:
import numpy as np
import cv2
import matplotlib.pyplot as plt
img_rgb = cv2.imread('your/image')
img = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
th = cv2.adaptiveThreshold(img,255, cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,3,3)
_, ctrs, _ = cv2.findContours(img,cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
im_h, im_w = img.shape
im_area = im_w * im_h
for ctr in ctrs:
x, y, w, h = cv2.boundingRect(ctr)
# Filter contours based on size
if 0.01 * im_area < w * h < 0.1*im_area:
cv2.rectangle(img_rgb, (x, y), (x+w, y+h), (0, 255, 0), 2)
plt.imshow(img_rgb, cmap='gray', vmin=0, vmax=255)
You can store the rectangle information in the filtering process above and then do the OCR within each individual rectangular area.
Upvotes: 1
Reputation: 78
You can define a grid structure and extract information from all separate area with openCV, check this article A Box detection algorithm for any image containing boxes
Everything is perfectly explained
Upvotes: 2