Encoding problems with PDF

Question

I have a (quite simple) java Spring Boot/REST service that renders PDF from input and testing it with IntelliJ.

I use pdfbox as the tool to create such pdfs.

One feature is that the client can give annexes as byte[] in addition to the regular content it wants.

Problem

When users tries the service, the final document has blank pages only for the annexes part.

Investigation

Tried with IntelliJ and HTTP REST Client and got the same issue
Saving the annexes into a separate files give a clear and correct document
Saving the whole document (regular content + annexes) into a file is correct as well.
Using postman, the document is fine....

When I notice that with postman it's working great, I changed the IntelliJ default file encoding for the response file that is generated (from UTF-8 to ISO-8859-1) and then successive documents are clear and correct... Don't forget that this problem seems to only affect annexes. The regular content is always fine.

Question

I suppose this is an encoding problem in annexes content. am I correct ?
Any way can i handle this on my side without impacting users service? Meaning to avoid some dev on their side.

Other Information

I tried many bytes conversion without success, for instance:

new String(annexe, StandardCharsets.ISO_8859_1).getBytes(StandardCharsets.UTF_8);

But each time I got an exception:

java.io.IOException: java.util.zip.DataFormatException: invalid stored block lengths

The document is sent back as byte[] like this:

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
pdfDocument.save(outputStream);
pdfDocument.close();
return outputStream.toByteArray();

Saving the document into a file is quite the same code, just a FileOutputStream is given instead.

Annexes are added to the document like this:

for(byte[] content : annexes) {
    PDDocument annex = PDDocument.load(content);
    for (PDPage page : annex .getPages()) {
        pdfDocument.importPage(page);
    }
}

I also tried the PDFMergerUtility but got the same result (blank pages for annexes)

LessThanTrue · Accepted Answer

Thanks to Tilman Hausherr suggestion, I tried to encode the byte[] with Base64.getEncoder().encode(...) and this does the work!

The client has to deal with a Base64 encoded string now but it works at least.

Thank you!

Encoding problems with PDF

Problem

Investigation

Question

Other Information

Answers (1)

Related Questions