Himanshu
Himanshu

Reputation: 21

Extract table into csv from scanned PDF by using pytesseract python

I have different type of invoice files, I want to find table in each invoice file. I am able to convert scanned pdf to image by using 'pdf2jpg' method now i have to extract table from each invoices and write into csv file by using OCR pytesseract method. Please help.

Upvotes: 1

Views: 6623

Answers (1)

Hietsh Kumar
Hietsh Kumar

Reputation: 1329

Perhaps this code will help you:

import pyautogui
import pytesseract

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'

text = pytesseract.image_to_string('c:\\screenshot\\test.png')

f = open('c:\\screenshot\\csvfile_1.csv','w')
f.write(text)
f.close()

Sample Image

Upvotes: 1

Related Questions