Reputation: 678
Based on the answer for the question Get the exact Stringposition in PDF I can now get all the strings in a PDF file. Please have a look at the code:
PdfReader reader = new PdfReader("file.pdf");
RenderListener listener = new MyTextRenderListener();
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
parser.processContent(1, listener);
static class MyTextRenderListener implements RenderListener {
@Override
public void renderText(TextRenderInfo renderInfo) {
String text = renderInfo.getText(); // line with text
}
@Override
public void beginTextBlock() { }
@Override
public void endTextBlock() { }
@Override
public void renderImage(ImageRenderInfo renderInfo) { }
}
mkl in his answer wrote:
if your
RenderListener
in addition to inspecting the text withgetText()
also considersgetBaseline()
or evengetAscentLine()
andgetDescentLine().
you have all the coordinates you will likely need.
In fact, TextRenderInfo has a few instances of LineSegment class which give some sort of coordinates. How do I use those coordinate (by transforming or extracting appropriate values from) to prepare a Rectangle object so the text that is found could be removed? A rectangle object has four coordinates that describe the position of the given text.
An example of removing strings (i.e. redacting) by using a Rectangle object can be found at SO (Remove text occurrences contained in a specified area with iText )
UPDATE
I managed to do what I wanted by trial-and-error but I consider this a workaround and not a proper solution.
@Override
public void renderText(TextRenderInfo renderInfo) {
LineSegment baseline = renderInfo.getBaseline();
float x = baseline.getStartPoint().get(Vector.I1);
float y = baseline.getStartPoint().get(Vector.I2);
float xx = baseline.getEndPoint().get(Vector.I1);
float yy = baseline.getEndPoint().get(Vector.I2);
rectangle = new Rectangle(x, yy, xx, y + 5);
}
Now I have a Rectangle object (note that I add 5 to one of its coordinates by playing with coordinate so that they cover all the string) and I can now redact the text. It works fine for unitary colours (e.g. white) when there is no image. When the text is on image or the page colour is in different colour than black, it will fail. That's why I describe my solution as a workaround. To me, it would be better to blank the text (replace it with empty string). How this could be done?
Response to mkl's comment Not sure, if I've done it right:
LineSegment descentLine = renderInfo.getDescentLine();
float x = descentLine.getStartPoint().get(Vector.I1);
float y = descentLine.getStartPoint().get(Vector.I2);
float xx = descentLine.getEndPoint().get(Vector.I1);
float yy = descentLine.getEndPoint().get(Vector.I2);
rectangle = new Rectangle(xx, yy, x, y);
I've used also the ascentLIne the same way. Unfortunetly, none of this have worked.
Upvotes: 1
Views: 481
Reputation: 95898
In all your attempts you tried to construct the rectangle from a single line, originally the base line, later the descent line. With such an approach you obviously don't have the height of the rectangle and can only guess.
Instead of that you should make use of both the descent and ascent lines!
E.g. assuming the simplified case of text drawn upright:
LineSegment ascentLine = renderInfo.getAscentLine();
LineSegment descentLine = renderInfo.getDescentLine();
float llx = descentLine.getStartPoint().get(Vector.I1);
float lly = descentLine.getStartPoint().get(Vector.I2);
float urx = ascentLine.getEndPoint().get(Vector.I1);
float ury = ascentLine.getEndPoint().get(Vector.I2);
rectangle = new Rectangle(llx, lly, urx, ury);
Upvotes: 0