Reputation: 1154
We are extracting text from PDF using iText/PDFBox, but additional text, invisible in the PDF, also gets extracted. Is there any any method and/or tools to get rid of these hidden texts?
Upvotes: 2
Views: 6311
Reputation: 2394
There are many different ways to add hidden text including
Each PDF may use a different method and to be able to separate them it you may need to know how the hidden text is implemented.
Does iText have an option to return the text colour ? If it does then you can try ignoring white coloured text objects.
Upvotes: 2