David
David

Reputation: 87

space added before css tag docx4j/Jsoup

I want to parse some html formatted string into a docx. I currently use Jsoup to transform my html to a clean xhtml string and docx4j (latest version) to parse xhtml to docx.

I had some problem with color as is not a tag supported by docx4j. I formatted my string to change it in css style with new tag (random name tag). Color now works but i have some space added before the writting in color and after.

Here is the code.

import java.io.File;
import java.util.List;

import org.docx4j.Docx4J;
import org.docx4j.convert.in.xhtml.XHTMLImporter;
import org.docx4j.convert.in.xhtml.XHTMLImporterImpl;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class test {

    public static void main(String[] args) throws Docx4JException {
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
        String outputfilepath = "test.docx";


        String d = "<html xmlns=\"http://www.w3.org/1999/xhtml\"><head><style type=\"text/css\">body{font-family:Arial; font-size:120%;}si{color:#0000FF;padding-right: 0;margin:0;}si:before { padding-right: 0;margin:0;indent:0; }</style></head><body><p>blabla<si><strong>blaebdqzd</strong>qdzd</si>zdqzdq</p></body></html>";
        String e ="<html xmlns=\"http://www.w3.org/1999/xhtml\"><head><style type=\"text/css\">body{font-family:Arial; font-size:120%;}si{color:#0000FF;padding-right: 0;margin:0;}si:before { padding-right: 0;margin:0;indent:0; }</style></head><body><p><si><strong>blaebdqzd</strong>qdzd</si>zdqzdq</p></body></html>";


        XHTMLImporter importer = new XHTMLImporterImpl(wordMLPackage);
        String text = htmlToXhtml(d);
        List<Object> content = importer.convert(text, null);
        wordMLPackage.getMainDocumentPart().getContent().addAll(content);

        importer = new XHTMLImporterImpl(wordMLPackage);
        text = htmlToXhtml(e);
        content = importer.convert(text, null);
        wordMLPackage.getMainDocumentPart().getContent().addAll(content);


        Docx4J.save(wordMLPackage, new File(outputfilepath), Docx4J.FLAG_NONE);
    }

    private static String htmlToXhtml(final String html) {
        final Document document = Jsoup.parse(html);
        document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
        return document.html();
    }
}

Could someone help me pls?

Upvotes: 0

Views: 460

Answers (1)

JasonPlutext
JasonPlutext

Reputation: 15878

Two things you could try:

  1. your random element "si": set whether its block or inline
  2. use a span instead

More generally, xhtml to docx relies on Flying Saucer / xhtml renderer, so you can sometimes find useful info by Googling that.

Upvotes: 0

Related Questions