Parsing PDF file using Apache PDFBox

Question

I am trying to modify the contents of a PDF document using PDFBox. I used this example as it is, but observed that the text it my PDF file is getting split at character level (or worse). For example, a string,EM? what it is: gets split into:

COSString{E}
COSString{M?}
COSString{ }
COSString{w}
COSString{hat }
COSString{it }
COSString{is}
COSString{:}

(when checked by printing the cosString in the above mentioned code). As far as I can see, there are only Latin characters in the file, and the encoding is also ISO-8859-1. Any ideas?

Regards,

Salil

Parsing PDF file using Apache PDFBox

Answers (1)

Related Questions