Reputation:
I have the code bellow, and it works for most of images type. But for some reasons it doesn't work with tiff images that contains only 1 page and pdf.
I have this error:
Traceback (most recent call last): File "/Users/fatiatravaille/Downloads/ocr_json/test.py", line 8, in image = Image.open(r'./radio_lomb_300.tiff') File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/PIL/Image.py", line 3023, in open raise UnidentifiedImageError( PIL.UnidentifiedImageError: cannot identify image file './radio_lomb_300.tiff'
import pytesseract
try:
from PIL import Image
except ImportError:
import Image
image = Image.open(r'./radio_lomb_300.tiff')
text=(pytesseract.image_to_string(image, lang='fra'))+'\n\n\n\n'
with open('text.test_ocr2','w') as fp: fp.write(text)
text=(pytesseract.image_to_boxes(image, lang='fra'))
with open('boundingBoxes.test_ocr2','w') as fp: fp.write(text)
text=(pytesseract.image_to_data(image, lang='fra'))
with open('data.test_ocr2','w') as fp: fp.write(text)
text=(pytesseract.image_to_osd(image))
with open('osd.test_ocr2','w') as fp: fp.write(text)
pdf = pytesseract.image_to_pdf_or_hocr(image, extension='pdf', lang='fra')
with open('test_ocr2.pdf', 'w+b') as f: f.write(pdf)
hocr = pytesseract.image_to_pdf_or_hocr(image, extension='hocr', lang='fra')
with open('test_ocr2.xml', 'w+b') as f: f.write(hocr)
hocr = pytesseract.image_to_pdf_or_hocr(image, extension='hocr', lang='fra')
with open('test_ocr2.xml', 'w+b') as f: f.write(hocr)
hocr = pytesseract.image_to_alto_xml(image)
with open('test_ocr_alto2.xml', 'w+b') as f: f.write(hocr)
Upvotes: 0
Views: 406
Reputation: 8005
Did you try using with opencv
?
For example, when I used with opencv
import cv2
import pytesseract
# Load the img
img = cv2.imread("radio_lomb_300.tiff")
# Cvt to gry
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
txt = pytesseract.image_to_string(gry, config="--psm 6")
print(txt)
Result will be:
Hirson le 06/04/2006
Mon Cher Confrére ,
Voici les clichés du rachis lombo-sacré et du bassin de face de :
Mme PINVIN Marie
Mise au point de lombalgies.
De face, le segment lombaire est pratiquement droit, de profil, la lordose est réguliére.
On note une déminéralisation osseuse diffuse.
Les vertébres sont de hauteur normale.
On note un pincement discal postérieur sur l’ensemble du segment lombaire.
Pincement discal plus important en LA-L5 et L5-S1.
En LA-L5, on note également un important pont osseux ostéophytique marginal antérieur
droit.
L’examen du bassin ne montre pas de lésion osseuse.
Les rapports articulaires sont intacts, inclinaison du bassin vers la droite d’environ 10 mm.
Avec mes remerciements, je vous prie de croire en l’expression de mes sentiments confra-
ternels les meilleurs.
Dr MICHEL HUBERTY
Examen numérisé : 7 incidences
SH
Though I'm not sure how accurate the output is. Well you can check the page segmentation methods for improving the quality.
Upvotes: 1