Reputation: 1
I'm struggling with the conversion of DOCX files to PDF.
In simple words, im selecting DOCX files from a database (bytes), merging them with APACHE POI and then trying to convert them to PDF using Docx4j, all in memory (in production, I can't work with files alocated in the local storage, it's just for testing).
The fact here is that the DOCX files from the database aren't consistent, maybe some metadata or properties are missing.
This is the method to merge DOCX files into one single DOCX file (manipulated as XML in memory)
public XWPFDocument mergeDocx(List<String> docxNames) throws Exception {
List<FileData> fileData = repository.getDocxs(docxNames);
ZipSecureFile.setMinInflateRatio(0);
InputStream inputS = new ByteArrayInputStream(fileData.get(0).getData());
OPCPackage opcPackage = OPCPackage.open(inputS);
XWPFDocument xwpfDocument = new XWPFDocument(opcPackage);
fileData.remove(0);
if (!fileData.isEmpty()) {
for (FileData fd : fileData) {
inputS = new ByteArrayInputStream(fd.getData());
opcPackage = OPCPackage.open(inputS);
XWPFDocument xwpf = new XWPFDocument(opcPackage);
CTBody bodyToAppend = xwpf.getDocument().getBody();
xwpfDocument.getDocument().addNewBody().set(bodyToAppend);
}
}
inputS.close();
opcPackage.close();
return xwpfDocument;
}
Both the final merged DOCX file and the selected from the DB ones are "broken" and can't work properly in the second method, if I want it to work, I have to create a local file of that final merged file and pass it through a DOCX converter
public void toPdf(XWPFDocument docxDocument) throws Exception {
//in
ByteArrayOutputStream baos = new ByteArrayOutputStream();
docxDocument.write(baos);
docxDocument.close();
byte[] bytes = baos.toByteArray(); //this is basically a ByteArray of an XML file, not a consistent DOCX one
WordprocessingMLPackage ml = Docx4J.load(new ByteArrayInputStream(bytes));
//out
OutputStream output = new FileOutputStream("/Users/Santiago/Documents/test.pdf");
Docx4J.toPDF(ml, output);
output.flush();
output.close();
}
The question here is, is there any way to have a consistent DOCX file (maybe adding some properties or applying some formatting) before going through the second method? Without resorting to external sources like the web-app I'm using to convert my "bad" docx file to a consistent one
Upvotes: 0
Views: 247
Reputation: 15878
Your merge code doesn't do what you think it does.
If all you need is PDF output, then create the PDFs first, then merge them using PdfBox.
If you want to do other stuff with your merged docx, then you could use the commercial Docx4j Enterprise to do the merge.
Upvotes: 1