How to 'Tag' regions of a Word Documents to make it easy to add text to them with Open Office XML?

My application needs to create a richly formatted Word document for the user. The process starts with two documents:

  1. A Word document template.
  2. A Word document that serves as a ‘database’ of paragraphs that can be added to the template document based on user input.

Based on user input, the selected paragraphs will be copied into the Word template creating a final Word document.

I think the needs are:

  1. Create Word Templates with 'tagged' regions. Say, some sort of tag in the template serves as a target for the first paragraph selected by the user.
  2. Code to find the 'tags' in the Word template and replace with formatted text from the 'database' word document.

Could anyone suggest how to 'tag' regions of the Word Template that can then be easily found programmatically?

Thanks, Matt

Upvotes: 1

Views: 4006

Answers (3)

JasonPlutext
JasonPlutext

Reputation: 15863

Tagging a document region

The neatest way to "tag" a region of a document is to use a content control.

If you use a blocklevel "rich text" content control, then it can contain block level content such as paragraphs and tables, as well as nested content controls.

Here's a simple example of a rich text content control (with some useful properties set).

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" >
  <w:body>
    <w:p>
      <w:r>
        <w:t>An ordinary top level p</w:t>
      </w:r>
    </w:p>
    <w:sdt>
      <w:sdtPr>
        <w:alias w:val="my title"/>
        <w:tag w:val="my tag"/>
        <w:id w:val="1508253281"/>
        <w:lock w:val="sdtLocked"/>
      </w:sdtPr>
      <w:sdtContent>
        <w:p >
          <w:r>
            <w:t>This is a paragraph in a rich text content control.</w:t>
          </w:r>
        </w:p>
        <w:p >
          <w:r>
            <w:t>Another paragraph </w:t>
          </w:r>
        </w:p>
        <w:tbl>
          <!-- table content -->
        </w:tbl>

      </w:sdtContent>
    </w:sdt>

  </w:body>
</w:document>

Because a content control's content is inside its sdtContent element, these are nice to manipulate from an XML point of view. (Compare bookmarks, for example, which have bookmarkStart and End point tags, which could have different parent elements!)

Once you have settled on content controls as your solution to your need #1, you have a choice to make regarding your need #2

replacing the content control content with formatted text

Inserting arbitrary content is a little complex, since you have to take care of relationships to other parts. I'd suggest you use code to merge docx files: see Merge multiple word documents into one Open Xml (the Document Builder approach is more robust than altChunk, since altChunk requires that the document be opened in an altChunk aware processor (eg Word or Plutext's) to convert the altChunk to normal docx content)

Alternatively, if you can assume the docx will be opened in Word 2013, you can use w15 richtext databinding. You put your formatted content in a custom XML part (as Flat OPC XML), and Word will automatically update the document with that content.

To get started with this, consider the following sample XML:

Sample XML

<myxml>
  <someelement>blagh</someelement>
  <yourdb>
    <content1>FLAT-OPC</content1>
  </yourdb>
</myxml>

Upload it to this service I wrote, and, as described in this blog post, it'll give you a docx back containing a content control with a w15:dataBinding.

Resulting content control

<w:sdt>
    <w:sdtPr>
        <w15:dataBinding w:prefixMappings="" w:xpath="/myxml[1]/yourdb[1]/content1[1]" w:storeItemID="{115f7b60-1982-4ec7-afc5-28d28886db4b}"/>
        <w:richText/>
    </w:sdtPr>
    <w:sdtContent>
        <w:p>
            <w:r>
                <w:t>Rich Word content can go here</w:t>
            </w:r>
        </w:p>
    </w:sdtContent>
</w:sdt>

After you've edited this in content in Word 2013, inspect the custom XML part:

CustomXML part content

<myxml>
  <someelement>blagh</someelement>
  <yourdb>
    <content1>
      &lt;?xml version="1.0" standalone="yes"?&gt;
      &lt;?mso-application progid="Word.Document"?&gt;
      &lt;pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage"&gt;&lt;pkg:part pkg:name="/_rels/.rels" pkg:contentType="application/vnd.openxmlformats-package.relationships+xml" pkg:padding="512"&gt;&lt;pkg:xmlData&gt;...&lt;/pkg:xmlData&gt;&lt;/pkg:part&gt;&lt;/pkg:package&gt;
    </content1>
  </yourdb>
</myxml>

You can see the element now contains escaped Flat OPC XML.

The beauty of this is:

  1. that content is self contained; it has everything necessary for it to be rendered (ie all styles, relationships etc)
  2. the binding is bi-directional. the user will see your database content when they open the document in Word 2013, and if they are allowed to edit that content, and changes they make will be reflected in the Custom XML part (so you can easily save the modified content to a database if you like)

Upvotes: 2

Tarveen
Tarveen

Reputation: 161

You can use content controls for this purpose. Content controls have a Tag property which you can set to be unique and then it can be programtically accessed using that Tag value. Here is the link which can get you started.

Upvotes: 0

Josh
Josh

Reputation: 1744

One way would be to use Merge Fields in a Word template. They're easy to add and allow you to programmatically manipulate them the OpenXML sdk.

Some info on how to get started.

Upvotes: 1

Related Questions