Reputation: 1857
My current goal is a little complex, but I'll try and explain it as best as I can. We have a piece of software that has been generating XML logs of all use for the past few months. Someone else has parsed from this data what they deemed necessary for documentation and placed it all into a readable HTML format.
My job is to find a way to link the readable HTML files they generated with a pre-existing Word (.docx) document. I currently have a NAnt script that reads through the directory that contains the logs and creates an XML document with the format:
<root>
<HTML address=...>
<ProductName name=...>
<FunctionName name=...>
</FunctionName>
</ProductName>
</HTML>
</root>
The word document itself contains tables that hold the function names. These tables are underneath headers that contain the product name. I need to wrap a link to the address associated with the function around the function name inside of the table so that someone reading the documentation can easily click on a function name and see the function's documentation.
I have no experience procedurally modifying word documents, so I would really love assistance on this, as it seems like a fairly complex procedure. I can reorder the XML nodes easily, if it would simplify the process in any way.
Things I've researched so far:
Before looking into the formatting of docx, I had wanted to write another script that would simply search for the associated data and then wrap hyperlink tags around it. Unfortunately, once I looked into docx, it seems that the formatting is much more complicated than that.
After that I decided to look at using C# through Visual Studio 2010. Unfortunately, as I have no experience using C# (I have only used C and C++), it was fairly confusing. I've spent a few days looking for guides and references, but it's all very scattered and I can't seem to find what I'm looking for.
Upvotes: 1
Views: 3238
Reputation: 15878
Three techniques for your kit bag:
Custom XML data binding. With this you can inject an XML document into your docx, and have the data (linked via XPath) show up automatically. Not inside a hyperlink though, so this might not work for you. Although you could run an AutoOpen macro when the docx is first opened in Word, to convert text to hyperlinks.
AltChunk. With this you can include HTML within your docx. You'd need to modify the docx though. (See 3 below)
Flat OPC XML. This is a representation of the docx as a single XML file, with Word 2007 or later can happily read or write. Using this, you can string replace the contents of a hyperlink using your tool of choice. You could also use this representation to make it easy to inject contents into AltChunks if you wanted.
The slight challenge with replacing the hyperlinks is that you have to do it in 2 places. First, the link text displayed to the user in the document itself (document.xml), and second, the destination URL (in the relationships part). These are tied together by a relId.
If you use AltChunk, you can replace in a single place. Not sure how long your documents are, and whether you'll get performance issues if you have hundreds of AltChunks (even if they basically just contain a single hyperlink).
Here is an example of a Flat OPC XML file containing a HTML AltChunk (which you ought to be able to save and drag onto Word or do File > Open):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage">
<pkg:part pkg:contentType="application/vnd.openxmlformats-package.relationships+xml" pkg:name="/_rels/.rels">
<pkg:xmlData>
<rel:Relationships xmlns:rel="http://schemas.openxmlformats.org/package/2006/relationships">
<rel:Relationship Id="rId1" Target="word/document.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"/>
</rel:Relationships>
</pkg:xmlData>
</pkg:part>
<pkg:part pkg:contentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml" pkg:name="/word/document.xml">
<pkg:xmlData>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" >
<w:body>
<w:altChunk r:id="rId2"/>
<w:sectPr>
<w:pgSz w:code="1" w:h="15840" w:w="12240"/>
<w:pgMar w:bottom="1440" w:left="1440" w:right="1440" w:top="1440"/>
</w:sectPr>
</w:body>
</w:document>
</pkg:xmlData>
</pkg:part>
<pkg:part pkg:contentType="application/vnd.openxmlformats-package.relationships+xml" pkg:name="/word/_rels/document.xml.rels">
<pkg:xmlData>
<rel:Relationships xmlns:rel="http://schemas.openxmlformats.org/package/2006/relationships">
<rel:Relationship Id="rId2" Target="../chunk.html" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk"/>
</rel:Relationships>
</pkg:xmlData>
</pkg:part>
<pkg:part pkg:compression="store" pkg:contentType="text/html" pkg:name="/chunk.html">
<pkg:binaryData>PGh0bWw+PGJvZHk+PHA+PGEgaHJlZj0iaHR0cDovL3N0YWNrb3ZlcmZsb3cuY29tIj5TdGFja092ZXJmbG93PC9hPjwvcD48L2JvZHk+PC9odG1sPg==</pkg:binaryData>
</pkg:part>
<pkg:part pkg:contentType="application/vnd.openxmlformats-package.relationships+xml" pkg:name="/_rels/chunk.html.rels">
<pkg:xmlData>
<rel:Relationships xmlns:rel="http://schemas.openxmlformats.org/package/2006/relationships"/>
</pkg:xmlData>
</pkg:part>
</pkg:package>
The binary data is
"<html><body><p><a href="http://stackoverflow.com">StackOverflow</a></p></body></html>"
base64 encoded (as required by the Flat OPC XML format).
Upvotes: 2