Reputation: 60213
UPDATE: Please see https://softwarerecs.stackexchange.com/questions/71464/java-library-to-insert-invisible-text-into-a-pdf instead.
I want to insert invisible text into an existing PDF file, to make it searchable.
What library should I use?
I would appreciate links to specific API methods to use.
Free, ideally open source.
Thanks a lot!
(For the curious: I want to automatically OCR incoming scanned papers and make them searcheable, in an Alfresco repository)
Upvotes: 3
Views: 7444
Reputation: 15868
3 options. My answers are itext-specific, but you should be able to translate the underlying methods to any sufficiently advance PDF library.
myPdfContentByte.setTextRenderMode(PdfContentByte.TEXT_RENDER_MODE_INVISIBLE);
myPdfStamper.getUnderContent(pageNum)
makes this easy, and will let you draw the text under the scan. Other libraries that let you access a page's contents might require you to add your text 'in the raw' at the beginning of an existing content stream. You'll want to check out the "PDF Spec" (google that, you'll be fine) for details. Chapter 9 is all about text rendering.Upvotes: 4
Reputation: 3450
This shows how to create a PDF document containing text and this shows how to add an image. Add the text first and then add the image on top of it - the text will become 'invisible' to the end user but will remain searchable by search engines. This may also be useful.
Upvotes: 1
Reputation: 1322
You do not have to render the text invisible. Just render them in the appropriate place but overlay the scanned image on the text. Or, you could render the text over the image and set alpha value of the color of the the stroke and brush to zero.
Upvotes: 0