Reputation: 523
I'm parsing a PDF using PDFBox and I'm trying to get the text color. I can get other properties like font, size, and position no problem using TextPosition attributes. Here's how I'm doing it:
@Override
protected void writeString (String string, List<TextPosition> textPositions) {
for (TextPosition textPosition : textPositions) {
System.out.println(textPosition.getFont());
System.out.println(textPosition.getFontSizeInPt());
System.out.println(textPosition.getXDirAdj() + ", " + textPosition.getYDirAdj());
}
However, I'm unable to retrieve the color of the text. I've searched Google for a solution but nothing has worked so far. Every tutorial I see seems to be using an old version of PDFBox. I don't have several of the methods that these people are using. For example, in one SO question they recommended using this code:
@Override
protected void processTextPosition(TextPosition text) {
try {
PDGraphicsState graphicsState = getGraphicsState();
System.out.println("R = " + graphicsState.getNonStrokingColor().getJavaColor().getRed());
System.out.println("G = " + graphicsState.getNonStrokingColor().getJavaColor().getGreen());
System.out.println("B = " + graphicsState.getNonStrokingColor().getJavaColor().getBlue());
}
catch (IOException ioe) {}
}
When I try to use this, IntelliJ tells me "getJavaColor()" is undefined. I have also tried with this code:
@Override
protected void processTextPosition(TextPosition text) {
try {
PDGraphicsState graphicsState = getGraphicsState();
System.out.println("R = " + graphicsState.getNonStrokingColor().toRGB());
}
catch (IOException ioe) {System.out.println(ioe); }
}
And, while the method is getting called as expected, and the expected number of times, it always prints 0, even though in my PDF file I have black text and red text.
Here are my Maven dependencies:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.17</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/fontbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>fontbox</artifactId>
<version>2.0.17</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox-tools -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox-tools</artifactId>
<version>2.0.17</version>
</dependency>
</dependencies>
Any help is appreciated
Upvotes: 0
Views: 1712
Reputation: 523
Apparently in PDFBox 2.0.0+ versions you need to add these lines of code:
addOperator(new SetStrokingColorSpace());
addOperator(new SetNonStrokingColorSpace());
addOperator(new SetStrokingDeviceCMYKColor());
addOperator(new SetNonStrokingDeviceCMYKColor());
addOperator(new SetNonStrokingDeviceRGBColor());
addOperator(new SetStrokingDeviceRGBColor());
addOperator(new SetNonStrokingDeviceGrayColor());
addOperator(new SetStrokingDeviceGrayColor());
addOperator(new SetStrokingColor());
addOperator(new SetStrokingColorN());
addOperator(new SetNonStrokingColor());
addOperator(new SetNonStrokingColorN());
to your PDFTextStripper overwritten class constructor. Now if you use:
@Override
protected void processTextPosition (TextPosition textPosition) {
try {
PDGraphicsState graphicsState = getGraphicsState();
System.out.println(graphicsState.getNonStrokingColor().toRGB());
}
catch (Exception ioe) {}
}
it actually prints a real value.
Upvotes: 1