Modifying OpenXML Word Document via API AND Text Manipulation

Question

I am working on prototypes to replace an existing word-automation based system for template rendering, and currently evaluation the OpenXML SDK. The template library is quite extensive (150-200 templates, maintained by non-technical resources) so I am hoping to avoid any template changes other than upgrading from the 1997-2003 word format.

Tags currently embedded need to be replaced sometimes with Text, and sometimes with Images/Charts/etc... (assume for now that all charts will be rendered to images prior to insertion).

I am able to do the straight text replacement using a technique similar to the one described in this MSDN article. My scenario is slightly more complex but looks something like this:

    public void ReplaceFirstOccurrenceWithText(string tagBody, string replacement)
    {
        var modifiedText = GetCurrentText();
        modifiedText = modifiedText.ReplaceFirst(tagBody, XmlEncoder.Encode(replacement));
        using (var sw = new StreamWriter(document.MainDocumentPart.GetStream(FileMode.Create)))
        {
            sw.Write(modifiedText);
        }
    }

    public string GetCurrentText()
    {
        using(var reader = new StreamReader(document.MainDocumentPart.GetStream()))
        {
            return reader.ReadToEnd();
        }
    }

The reason I don't save the string is because I want the underlying document to stay up to date so I can add images through the normal API. Using the technique described in another MSDN article:

    public void ReplaceFirstOccurrenceWithImage(string tagBody, byte[] replacement)
    {
        ReplaceFirstOccurrenceWithText(tagBody, "IMAGE TAG WAS HERE!");
        var main = document.MainDocumentPart;
        var imagePart = main.AddImagePart(ImagePartType.Gif);//sniff this by loading bytes into a bitmap
        using(var imageStream = new MemoryStream(replacement))
        {
            imagePart.FeedData(imageStream);
        }

        ImageInserter.AddImageToBody(document, main.GetIdOfPart(imagePart));
    }

Where ImageInserter is literally a copy/paste of the code in that article (I realize these abstractions are not the best but I'm just trying to get anything to work at this point).

Now is where it gets hairy - the document APPEARS to be staying in sync. The image is the first tag getting replaced, and the text replacement for the tag works, as does adding the image at the bottom of the document. My problem is that subsequent text replacement does not seem to work at all after this point - all the other tags remain in the document. However, If I set a break point in the text replacement function, each call to .GetCurrentText() returns the correct result (text with the tags up to that point having been replaced). But when I save the document, it is saved with only the first replacement having been done.

Has anyone run into anything like this? Next step is going to be to try a phased approach (resolve all tags, run straight text replacement first, then do all image replacements) but I feel like whatever is wrong currently will remain a problem regardless of order.

Micah Armantrout · Accepted Answer

If I where you I would look into

http://docx.codeplex.com

its a lot more straight forward for general stuff .. maybe you are doing something more complex then the library can handle but I would def take a look at it.

Modifying OpenXML Word Document via API AND Text Manipulation

Answers (1)

Related Questions