Reputation: 554
I have a lot of cropped images from table image. OCR
has some problems with text detecting because of "leftovers" of table borders. Actually i'm looking for way to remove them (I want to pick-up only text). Here are some examples of them:
Thanks!
Upvotes: 0
Views: 864
Reputation: 2714
This Code (based on opencv) solves the problem for the two examples. The procedure is the following:
here the python code:
import cv2
import matplotlib.pylab as plt
import numpy as np
# load image
img = cv2.imread('om9gN.jpg',0)
# blur and apply otsu threshold
img = cv2.blur(img, (3,3))
_, img = cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# invert image
img = (img == 0).astype(np.uint8)
img_new = np.zeros_like(img)
# find contours
_,contours,_ = cv2.findContours(img, 1, 2)
for idx, cnt in enumerate(contours):
# get area of contour
temp = np.zeros_like(img)
cv2.drawContours(temp, contours , idx, 1, -1)
area_cnt = np.sum(temp)
# get number of pixels of bounding box of contour
x,y,w,h = cv2.boundingRect(cnt)
area_box = w * h
# get ratio of cnt-area and box-area
ratio = float(area_cnt) / area_box
# only draw contour if:
# - 1.) ratio is not too big (line fills whole bounding box)
# - 2.) ratio is not too small (combination of lines fill very
# small ratio of bounding box)
if 0.9 > ratio > 0.2:
cv2.drawContours(img_new, contours , idx, 1, -1)
plt.figure()
plt.subplot(1,2,1)
plt.imshow(img_new)
plt.axis("off")
plt.show()
Upvotes: 1