Reputation: 181
I am using Ghostscript 9.20 to extract the text from a PDF document that contains only two lines of text:
Hello world…
A beautiful day!
The code applied is:
gswin32c -sDEVICE=txtwrite -o output.txt input.pdf
However, the the output is:
䠀攀氀氀漀 眀漀爀氀搀☠
䄀 戀攀愀甀琀椀昀甀氀 搀愀礀℀
What is going on and how do I fix it?
Upvotes: 8
Views: 10917
Reputation: 31141
There was a bug in the 9.20 release which affected certain kinds of text extraction. Not all, it depends on the input, and since you haven't supplied that its impossible to tell if your particular input file is affected.
To fix it you can:
Upvotes: 4