Ganesh Nannaware
Ganesh Nannaware

Reputation: 297

Convert scanned pdf to .txt files using tesseract

I have to convert a .pdf file containing scanned images into .txt files. The tesseract ocr converts only images to .txt, but I need to first extract the .tif images and then convert it. Can anyone help me with this?

Upvotes: 13

Views: 19512

Answers (1)

Karol S
Karol S

Reputation: 9402

Use Imagemagick:

convert -density 600 input.pdf output.tif

Density is in DPI, from my experience 600 DPI works the best.

Upvotes: 22

Related Questions