Reputation:
I am trying to convert a docx
file which contains table and images into a pdf
format file.
I have been searching everywhere but did not get proper solution, request to give proper and correct solution:
here what i have tried :
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class TestCon {
public static void main(String[] args) {
TestCon cwoWord = new TestCon();
System.out.println("Start");
cwoWord.ConvertToPDF("D:\\Test.docx", "D:\\Test1.pdf");
}
public void ConvertToPDF(String docPath, String pdfPath) {
try {
InputStream doc = new FileInputStream(new File(docPath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(pdfPath));
PdfConverter.getInstance().convert(document, out, options);
System.out.println("Done");
} catch (FileNotFoundException ex) {
System.out.println(ex.getMessage());
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
Exception:
Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log(ILjava/lang/Object;)V from class org.apache.poi.openxml4j.opc.PackageRelationshipCollection
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:313)
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:162)
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:130)
at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:559)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:112)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:83)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:128)
at org.apache.poi.openxml4j.opc.ZipPackagePart.<init>(ZipPackagePart.java:78)
at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:239)
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:665)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:121)
at test.TestCon.ConvertToPDF(TestCon.java:31)
at test.TestCon.main(TestCon.java:25)
My requirement is to create a java code to convert existing docx into pdf with proper format and alignment.
Please suggest.
Jars Used:
Upvotes: 35
Views: 112394
Reputation: 41
Just a thought.
What if LibreOffice(OpenSource) command line utility is to be used for the purpose!!!
Yes, from Java program these commands can be invoked.
Details of the solution is here Usage of LibreOffice command line utility for Doc to PDF conversion
Hope it helps.
Upvotes: 0
Reputation: 91
Let me give you an example in Kotlin With dependencies that work for me (the method from the question did not change)
//gradle.kts
implementation("fr.opensagres.xdocreport:fr.opensagres.poi.xwpf.converter.pdf:2.0.4")
implementation("org.apache.pdfbox:pdfbox:3.0.0")
kotlin
import org.apache.poi.xwpf.usermodel.XWPFDocument
import java.io.*
import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter
import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions
fun ConvertToPDF(docPath: String?, pdfPath: String?) {
try {
val doc: InputStream = FileInputStream(File(docPath))
val document = XWPFDocument(doc)
val options = PdfOptions.create()
val out: OutputStream = FileOutputStream(File(pdfPath))
PdfConverter.getInstance().convert(document, out, options)
println("Done")
} catch (ex: FileNotFoundException) {
println(ex.message)
} catch (ex: IOException) {
println(ex.message)
}
}
Upvotes: -1
Reputation: 3896
If your document is pretty rich and your option is to do the converting on Linux/Unix then all three main options suggested in the thread could be "a bit" painful to implement.
A solution which I might suggest is to use Gotenberg: A Docker-powered stateless API for converting HTML, Markdown and Office documents to PDF.
$ docker run --rm -p 3000:3000 thecodingmachine/gotenberg:7
curl
:$ curl \
--request POST 'http://localhost:3000/forms/libreoffice/convert' \
--form 'files=@"/path/to/file.docx"' \
-o result.pdf
Deploy it to your infrastructure (e.g. as a separate microservice) and shoot it from your Java service making that simple HTTP request. Get your PDF file in the response and do what you want with it.
Tested, and works like a charm!
Upvotes: 8
Reputation:
In addition to the VivekRatanSinha answer, i would i like to post full code and required jars for the people who need it in future.
Code:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class WordConvertPDF {
public static void main(String[] args) {
WordConvertPDF cwoWord = new WordConvertPDF();
cwoWord.ConvertToPDF("D:/Test.docx", "D:/Test.pdf");
}
public void ConvertToPDF(String docPath, String pdfPath) {
try {
InputStream doc = new FileInputStream(new File(docPath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(pdfPath));
PdfConverter.getInstance().convert(document, out, options);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
and JARS:
Enjoy :)
Upvotes: 32
Reputation: 11542
I have done a lot of research and found Documents4j is the best free API for converting docx to pdf. Alignment, font everything documents4j doing good job.
Maven Dependencies:
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-local</artifactId>
<version>1.0.3</version>
</dependency>
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-transformer-msoffice-word</artifactId>
<version>1.0.3</version>
</dependency>
Use the below code for convert docx to pdf.
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;
public class Document4jApp {
public static void main(String[] args) {
File inputWord = new File("Tests.docx");
File outputFile = new File("Test_out.pdf");
try {
InputStream docxInputStream = new FileInputStream(inputWord);
OutputStream outputStream = new FileOutputStream(outputFile);
IConverter converter = LocalConverter.builder().build();
converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
outputStream.close();
System.out.println("success");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Upvotes: 8
Reputation: 11542
Docx4j is open source and the best API for convert Docx to pdf without any alignment or font issue.
Maven Dependencies:
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-JAXB-Internal</artifactId>
<version>8.0.0</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
<version>8.0.0</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-JAXB-MOXy</artifactId>
<version>8.0.0</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-export-fo</artifactId>
<version>8.0.0</version>
</dependency>
Code:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
public class DocToPDF {
public static void main(String[] args) {
try {
InputStream templateInputStream = new FileInputStream("D:\\\\Workspace\\\\New\\\\Sample.docx");
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
String outputfilepath = "D:\\\\Workspace\\\\New\\\\Sample.pdf";
FileOutputStream os = new FileOutputStream(outputfilepath);
Docx4J.toPDF(wordMLPackage,os);
os.flush();
os.close();
} catch (Throwable e) {
e.printStackTrace();
}
}
}
Upvotes: 1
Reputation: 79
I had a very complex document and Microsoft Graph Api helped to produce the correct pdf with the exact formatting as in the input Docx
The following links help complete App Registration for setting up the API key/secrets to access the Graph Api https://medium.com/medialesson/convert-files-to-pdf-using-microsoft-graph-azure-functions-20bc84d2adc4
https://devzigma.com/java/upload-files-to-sharepoint-using-java/
Once the docx document is uploaded, see this to be able to download the pdf version of the same docx: https://learn.microsoft.com/en-us/graph/api/driveitem-get-content-format?view=graph-rest-1.0&tabs=java
Upvotes: 2
Reputation: 201
I will provide 3 methods to convert docx to pdf :
Code :
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions;
import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class ConvertDocToPdfitext {
public static void main(String[] args) {
System.out.println( "Starting conversion!!!" );
ConvertDocToPdfitext cwoWord = new ConvertDocToPdfitext();
cwoWord.ConvertToPDF("C:/Users/avijit.shaw/Desktop/testing/docx/Account Opening Prototype Details.docx", "C:/Users/avijit.shaw/Desktop/testing/docx/Test-1.pdf");
System.out.println( "Ending conversion!!!" );
}
public void ConvertToPDF(String docPath, String pdfPath) {
try {
InputStream doc = new FileInputStream(new File(docPath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(pdfPath));
PdfConverter.getInstance().convert(document, out, options);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
Dependencies: Use Maven to resolve dependencies.
New version 2.0.2 of fr.opensagres.poi.xwpf.converter.core runs with apache poi 4.0.1 and itext 2.17. You just need to add below dependency in Maven and then maven will auto download all dependent dependencies. (Updated your Maven project, so it downloaded all these libraries and all of its dependencies)
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
<version>2.0.2</version>
</dependency>
Note: You need to have MS Office installed on the machine in which this code is running.
Code :
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;
public class Document4jApp {
public static void main(String[] args) {
File inputWord = new File("C:/Users/avijit.shaw/Desktop/testing/docx/Account Opening Prototype Details.docx");
File outputFile = new File("Test_out.pdf");
try {
InputStream docxInputStream = new FileInputStream(inputWord);
OutputStream outputStream = new FileOutputStream(outputFile);
IConverter converter = LocalConverter.builder().build();
converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
outputStream.close();
System.out.println("success");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Dependencies : Use Maven to resolve dependencies.
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-local</artifactId>
<version>1.0.3</version>
</dependency>
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-transformer-msoffice-word</artifactId>
<version>1.0.3</version>
</dependency>
Note: You need to have OpenOffice installed on the machine in which this code is running. Code :
import java.io.File;
import com.sun.star.beans.PropertyValue;
import com.sun.star.comp.helper.BootstrapException;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.frame.XDesktop;
import com.sun.star.frame.XStorable;
import com.sun.star.lang.XComponent;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.uno.Exception;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;
import ooo.connector.BootstrapSocketConnector;
public class App {
public static void main(String[] args) throws Exception, BootstrapException {
System.out.println("Stating conversion!!!");
// Initialise
String oooExeFolder = "C:\\Program Files (x86)\\OpenOffice 4\\program"; //Provide path on which OpenOffice is installed
XComponentContext xContext = BootstrapSocketConnector.bootstrap(oooExeFolder);
XMultiComponentFactory xMCF = xContext.getServiceManager();
Object oDesktop = xMCF.createInstanceWithContext("com.sun.star.frame.Desktop", xContext);
XDesktop xDesktop = (XDesktop) UnoRuntime.queryInterface(XDesktop.class, oDesktop);
// Load the Document
String workingDir = "C:/Users/avijit.shaw/Desktop/testing/docx/"; //Provide directory path of docx file to be converted
String myTemplate = workingDir + "Account Opening Prototype Details.docx"; // Name of docx file to be converted
if (!new File(myTemplate).canRead()) {
throw new RuntimeException("Cannot load template:" + new File(myTemplate));
}
XComponentLoader xCompLoader = (XComponentLoader) UnoRuntime
.queryInterface(com.sun.star.frame.XComponentLoader.class, xDesktop);
String sUrl = "file:///" + myTemplate;
PropertyValue[] propertyValues = new PropertyValue[0];
propertyValues = new PropertyValue[1];
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Hidden";
propertyValues[0].Value = new Boolean(true);
XComponent xComp = xCompLoader.loadComponentFromURL(sUrl, "_blank", 0, propertyValues);
// save as a PDF
XStorable xStorable = (XStorable) UnoRuntime.queryInterface(XStorable.class, xComp);
propertyValues = new PropertyValue[2];
// Setting the flag for overwriting
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Overwrite";
propertyValues[0].Value = new Boolean(true);
// Setting the filter name
propertyValues[1] = new PropertyValue();
propertyValues[1].Name = "FilterName";
propertyValues[1].Value = "writer_pdf_Export";
// Appending the favoured extension to the origin document name
String myResult = workingDir + "letterOutput.pdf"; // Name of pdf file to be output
xStorable.storeToURL("file:///" + myResult, propertyValues);
System.out.println("Saved " + myResult);
// shutdown
xDesktop.terminate();
}
}
Dependencies : Use Maven to resolve dependencies.
<!-- https://mvnrepository.com/artifact/org.openoffice/unoil -->
<dependency>
<groupId>org.openoffice</groupId>
<artifactId>unoil</artifactId>
<version>3.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.openoffice/juh -->
<dependency>
<groupId>org.openoffice</groupId>
<artifactId>juh</artifactId>
<version>3.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.openoffice/bootstrap-connector -->
<dependency>
<groupId>org.openoffice</groupId>
<artifactId>bootstrap-connector</artifactId>
<version>0.1.1</version>
</dependency>
Upvotes: 18
Reputation: 71
Only ONE dependency needs to manually added (the others should be pulled in automatically). As of right now, the latest version is 2.0.2.
Gradle:
dependencies {
// What you should already have
implementation "org.apache.poi:poi-ooxml:latest.release"
// ADD THE BELOW LINE TO DEPENDENCIES BLOCK IN build.gradle
implementation 'fr.opensagres.xdocreport:fr.opensagres.poi.xwpf.converter.pdf:2.0.2'
}
Maven only:
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
<version>2.0.2</version>
</dependency>
Upvotes: 1
Reputation: 512
You need to add these maven dependencies
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-excelant</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-examples</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>org.apache.poi.xwpf.converter.core</artifactId>
<version>1.0.6</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>org.apache.poi.xwpf.converter.pdf</artifactId>
<version>1.0.6</version>
</dependency>
Upvotes: 2
Reputation: 751
I use this code.
private byte[] toPdf(ByteArrayOutputStream docx) {
InputStream isFromFirstData = new ByteArrayInputStream(docx.toByteArray());
XWPFDocument document = new XWPFDocument(isFromFirstData);
PdfOptions options = PdfOptions.create();
//make new file in c:\temp\
OutputStream out = new FileOutputStream(new File("c:\\tmp\\HelloWord.pdf"));
PdfConverter.getInstance().convert(document, out, options);
//return byte array for return in http request.
ByteArrayOutputStream pdf = new ByteArrayOutputStream();
PdfConverter.getInstance().convert(document, pdf, options);
document.write(pdf);
document.close();
return pdf.toByteArray();
}
Upvotes: -1
Reputation: 616
You are missing some libraries.
I am able to run your code by adding the following libraries:
Apache POI 3.15 org.apache.poi.xwpf.converter.core-1.0.6.jar org.apache.poi.xwpf.converter.pdf-1.0.6.jar fr.opensagres.xdocreport.itext.extension-2.0.0.jar itext-2.1.7.jar ooxml-schemas-1.3.jar
I have successfully converted a 6 pages long Word document (.docx) with tables, images and various formatting.
Upvotes: 19