Reputation: 334
I am using org.apache.poi.xwpf.converter.xhtml.XHTMLConverter class to convert docx to html. Below is my groovy code
public Map convert(String wordDocPath, String htmlPath,
Map conversionParams)
{
log.info("Converting word file "+wordDocPath)
try
{
...
String notificationWorkingFolder = "C:\tomcats\Notification\store\Notification1234"
FileInputStream fis = new FileInputStream(wordDocPath);
XWPFDocument document = new XWPFDocument(fis);
XHTMLOptions options = XHTMLOptions.create().URIResolver(new FileURIResolver(new File(notificationWorkingFolder)));
File htmlFile = new File(htmlPath);
OutputStream out = new FileOutputStream(htmlFile)
XHTMLConverter.getInstance().convert(document, out, options);
log.info("Converted to HTML file "+htmlPath)
return [success:true,htmlFileName:getFileName(htmlPath)]
}
catch(Exception e)
{
log.error("Exception :"+e.getMessage(),e)
return [success:false]
}
}
The above code is converting docx to html successfully, but if docx contains any images it puts <img src="C:\tomcats\Notification\store\Notification1234\word\media\image1.png">
but do not copy the image to that folder. As a result, when I open html tag, all images appears empty. Am I missing something in code? Is there a way to generate an image srouce link instead of absolute path, like <img src="http://localhost:8080/webapp/image1.png">
Upvotes: 1
Views: 3123
Reputation: 2475
Even with proper usage, it's worth noting that XHTMLConverter uses XHTMLMapper, which does not process headers, footers, or VML Images. Any images falling into those categories will be lost.
The PDFConverter is more fully featured, but also uses the GPL licensed library, iText.
Upvotes: 0
Reputation: 334
I got answer for first question from this link lychaox.com/java/poi/Word07toHtml.html. I had to add one line of code options.setExtractor(new FileImageExtractor(imageFolderFile));
to generate images.
Second question I resolved by pattern search and replacement.
Upvotes: 1