Reputation: 458
How to use opencv and pytesseract to extract text from image?
import cv2
import pytesseract from PIL import Image import numpy as np from matplotlib import pyplot as plt
img = Image.open('test.jpg').convert('L')
img.show()
img.save('test','png')
img = cv2.imread('test.png',0)
edges = cv2.Canny(img,100,200)
#contour = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#print pytesseract.image_to_string(Image.open(edges))
print pytesseract.image_to_string(edges)
But this is giving error-
Traceback (most recent call last): File "open.py", line 14, in print pytesseract.image_to_string(edges) File "/home/sroy8091/.local/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 143, in image_to_string if len(image.split()) == 4: AttributeError: 'NoneType' object has no attribute 'split'
Upvotes: 3
Views: 10316
Reputation: 395
If you like to do some pre-processing using opencv (like you did some edge detection) and later on if you wantto extract text, you can use this command,
# All the imports and other stuffs goes here
img = cv2.imread('test.png',0)
edges = cv2.Canny(img,100,200)
img_new = Image.fromarray(edges)
text = pytesseract.image_to_string(img_new, lang='eng')
print (text)
Upvotes: 8
Reputation: 29
You cannot use directly Opencv objects with tesseract methods.
Try:
from PIL import Image
from pytesseract import *
image_file = 'test.png'
print(pytesseract.image_to_string(Image.open(image_file)))
Upvotes: 0