Reputation: 127
I am trying to find a solution to convert a docx file to XHTML.
I found xdocreport, which looks good, but I have some issues. (and I am new to xdocreport)
According to their documentations on github here and here: I should be able to convert with this code:
String source = args[0];
String dest = args[1];
// 1) Create options DOCX to XHTML to select well converter form the registry
Options options = Options.getFrom(DocumentKind.DOCX).to(ConverterTypeTo.XHTML);
// 2) Get the converter from the registry
IConverter converter = ConverterRegistry.getRegistry().getConverter(options);
// 3) Convert DOCX to (x)html
try {
InputStream in = new FileInputStream(new File(source));
OutputStream out = new FileOutputStream(new File(dest));
converter.convert(in, out, options);
} catch (XDocConverterException | FileNotFoundException e) {
e.printStackTrace();
}
I am using these dependencies (tried different versions, like 2.0.2, 2.0.0, 1.0.6):
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.document.docx</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.template.freemarker</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
<version>2.0.2</version>
</dependency>
My issues:
How can I handle these issues? (Or how can I convert docx to xhtml using Docx4j with formats/numbering/images?)
Upvotes: 1
Views: 2704
Reputation: 61852
To convert *.docx
to XHTML
using XDocReport
and apache poi
's XWPFDocument
as the source you will need XHTMLOptions
. Those options are able having ImageManager
to set the path for extracted images from XWPFDocument
. Then XHTMLConverter
is needed to convert.
Complete example:
import java.io.*;
//needed jars: xdocreport-2.0.2.jar,
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter;
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLOptions;
import fr.opensagres.poi.xwpf.converter.core.ImageManager;
//needed jars: all apache poi dependencies
import org.apache.poi.xwpf.usermodel.*;
public class DOCXToXHTMLXDocReport {
public static void main(String[] args) throws Exception {
String docPath = "./WordDocument.docx";
String root = "./";
String htmlPath = root + "WordDocument.html";
XWPFDocument document = new XWPFDocument(new FileInputStream(docPath));
XHTMLOptions options = XHTMLOptions.create().setImageManager(new ImageManager(new File(root), "images"));
FileOutputStream out = new FileOutputStream(htmlPath);
XHTMLConverter.getInstance().convert(document, out, options);
out.close();
document.close();
}
}
This handles images properly.
But XDocReport
is unable handling page background colors of XWPFDocument
properly until now. It extracts and handles paragraph background colors but not page background colors.
Upvotes: 2