user1722476
user1722476

Reputation:

read, replace placeholders and write a word file with docx4j

I have a Word File look like this:

You don't need to understand the content, just take a look to my placeholders <env> and <applikationsabkürzung>. There are 10 pages with these placeholders and now I should replace them with other content. The black and yellow box are company pictures which I won't share.

Now I started to read the whole docx4j doc and generate after some time the following code:

public void manipulateWord(String path, String env, String appl) {
    try {
        WordprocessingMLPackage wpml = WordprocessingMLPackage.load(new File(path));

        MainDocumentPart mdp = wpml.getMainDocumentPart();

        List<Object> content = mdp.getContent();

        // Include all Strings to replace
        HashMap<String, String> mappings = new HashMap<String, String>();
        mappings.put("<env>", env);
        mappings.put("<applikationsabkürzung>", appl);

        for (Object object : content) {
            Text textElement = (Text) object;
            String textToReplace = textElement.getValue();
            if (mappings.keySet().contains(textToReplace)) {
                textElement.setValue(mappings.get(textToReplace));
            }
        }

        wpml.save(new File("C:\\Users\\kristina\\Desktop\\outputfile.docx"));

    } catch (Docx4JException e) {
        LOG.error(e);
    }

Some explanaition:

But when I run the method, nothing happen, my console just print just some infos. If they're important, i'll edit the post, but i don't think so.

So where is my fault? Would it work like that? I'm shortly before to despair...

Upvotes: 3

Views: 6331

Answers (1)

Ben
Ben

Reputation: 7597

MainDocumentPart.getContent() will return all OpenXml components within the main document flow (things like headers and footers have their own elements). Your code is assuming that the result of List<Object> content will be a collection of Text elements, which is not necessarily the case. For example, a typical (simple) document structure would be like this:

P  // Paragraph element
    -> R  // Run element
        -> Text  // Text element

… so getContent() is, in all likelihood, going to spit out a load of P objects for a start.

There are a few ways to traverse docx4 files -- see the main docx4j site for more -- but one approach is shown in the method below. You can pass in MaindocumentPart as the first Object, and Text.class as the object type to search for. This should then assist in identifying all Text elements which contain one of your mapping values:

public List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
    List<Object> result = new ArrayList<Object>();
    if (obj instanceof JAXBElement)
        obj = ((JAXBElement<?>) obj).getValue();

    if (obj.getClass().equals(toSearch))
        result.add(obj);
    else if (obj instanceof ContentAccessor) {
        List<?> children = ((ContentAccessor) obj).getContent();
        for (Object child : children) {
            result.addAll(getAllElementFromObject(child, toSearch));
        }
    }

    return result;
}

Upvotes: 3

Related Questions