MrD
MrD

Reputation: 2481

How to read PDF from the .jar file

In my maven project I have PDF file which is located inside resources folder. My function reads the PDF file from the resources folder and adds some values in the document based on the user's data.

This project is packed as .jar file using mvn clean install and is used as dependency in my other spring boot application.

In my spring boot project I create instace of the class that will perform some work on the PDF. Once all job on the PDF file is done, and when PDF file is saved on file system it is always empty (all pages are blank). I have impression that mvn clean install does something with the PDF file. Here is what I've tried so far:

First way

ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
File file= new ClassPathResource("/pdfs/testpdf.pdf").getFile();//Try to get PDF file

PDDocument pdf = PDDocument.load(file);//Load PDF document from the file
List<PDField> fields = forms.getFields();//Get input fields that I want to update in the PDF    
fieldsMap.forEach(throwingConsumerWrapper((field,value) -> changeField(fields,field,value)));//Set input field values

pdf.save(byteArrayOutputStream);//Save value to the byte array

This works great, but as soon as project is packed in a .jar file then I get exception that new ClassPathResource("/pdfs/testpdf.pdf").getFile(); can't find the specified file.

This is normal because the File class can't access anything inside .jar file (it can access the .jar file itself only) and that is clear.

So, the solution to that problem is to use the InputStream instead of the File. Here is what I did:

Second way

ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
InputStream inputStream = new ClassPathResource("/pdfs/testpdf.pdf").getInputStream();//Try to get input stream

PDDocument pdf = PDDocument.load(inputStream );//Load PDF document from the input stream
List<PDField> fields = forms.getFields();//Get input fields that I want to update in the PDF    
fieldsMap.forEach(throwingConsumerWrapper((field,value) -> changeField(fields,field,value)));//Set input field values

pdf.save(byteArrayOutputStream);//Save value to the byte array

This time getInputStream() doesn't throw error and inputStream object is not null. But the PDF file once saved on my file system is empty, meaning all pages are empty.

I even tried to copy complete inputStream and saving it to the file byte by byte but what I've noticed that every byte is equal 0. Here is what I did:

Third way

InputStream inputStream = new ClassPathResource("/pdfs/test.pdf").getInputStream();
byte[] buffer = new byte[inputStream.available()];
inputStream.read(buffer);

File targetFile = new File(OUTPUT_FOLDER);
OutputStream outStream = new FileOutputStream(targetFile);
outStream.write(buffer);

Copied test.pdf is saved but when opened with Adobe Reader is reported as corrupted.

Anyone have idea how to fix this?

Upvotes: 1

Views: 2603

Answers (2)

MrD
MrD

Reputation: 2481

After few hours of investigation and good input from @Simon Martinelli and @Tilman Hausherr I had 2 issues to solve:

First issue - Read the file correctly

In order to read a file from the resources folder you have to use appropriate classes. As stated above you can't use File class to read the file from the .jar and I used the following construction in my case:

InputStream inputStream = CreatePDF.class.getResourceAsStream("/pdfs/test.pdf");
PDDocument pdf = PDDocument.load(inputStream);

In my case CreatePDF class is static one. If your class is not static then use the following:

InputStream inputStream = this.getClass().getResourceAsStream("/pdfs/test.pdf");
PDDocument pdf = PDDocument.load(inputStream);

Second issue - My original problem

One thing I noticed in my third example of my question is, when I'm copying file byte by byte from the resources to my local folder then all bytes were equal to 0. I knew this can't be correct so I tried to do the same thing with simple .txt file and in that case everything worked correctly. This means mvn clean install was causing some problems on PDF files. After some investigation I realized that mvn filters are causing the problem. If resource filters are enabled:

<resource>
    <directory>src/main/resources</directory>
    <filtering>true</filtering>
</resource>

then your binary data is going to be corrupted and that was my original problem. When I set it to false it worked like expected.

Here is Warning from the maven page:

Warning: Do not filter files with binary content like images! This will most likely result in corrupt output.

If you have both text files and binary files as resources it is recommended to have two separated folders. One folder src/main/resources (default) for the resources which are not filtered and another folder src/main/resources-filtered for the resources which are filtered.

Here is an example how you could do it:

<resource>
    <directory>src/main/resources</directory>
    <filtering>true</filtering>
    <includes>
        <include>**/*.properties</include>
        <include>**/*.xml</include>
        <include>**/*.txt</include>
        <include>**/*.html</include>
    </includes>
</resource>
<resource>
    <directory>src/main/resources</directory>
    <filtering>false</filtering>
    <includes>
        <include>**/*.pdf</include>
    </includes>
</resource>

Upvotes: 2

Simon Martinelli
Simon Martinelli

Reputation: 36143

You have to load it like this:

InputStream inputStream = this.getClass().getClassloader().getResourceAsStream("/pdfs/testpdf.pdf");

If you load it via the ClassLoader the path starts in the root of the classpath.

Upvotes: 2

Related Questions