Alex
Alex

Reputation: 567

How to add comments to a .docx XML

At work, we have a word document that we have to edit all the time to pass on to another team, to tell them how to perform some tasks. Since I don't like mindlessly filling out data, and I always look for ways to simplify the tasks I have to do, I decided I would automate this process. After considering a few methods (such as generating a word document from scratch or editing an existing document), I decided to edit the document in-place.

I have inserted special tags into the document (specifically, they take the form [SOME_NAME_HERE]), and I will then parse the document for those special tags and replace them with the value I actually need. I then extract the .docx to a folder with all of the XML documents inside of it, and parse the document.xml file, replacing the values.

During this process, depending on what is actually needed, there are sections of the document that will have to be removed from it. So my first thought was to add comments to the document.xml file. For example:

<!-- INITIAL BUILD ONLY -->
      <w:p w:rsidR="00202319" w:rsidRPr="00D00FF5" w:rsidRDefault="00202319" w:rsidP="00AC0192">
        <w:r w:rsidR="00E548A2" w:rsidRPr="00D00FF5">
          <w:rPr>
            <w:rStyle w:val="emcfontstrong"/>
          </w:rPr>
          <w:t>Some text here</w:t>
        </w:r>
      </w:p>
<!-- END INITIAL BUILD ONLY -->

Then, when I go to generate the output word document, I would simply remove all of the sections that were "INITIAL BUILD ONLY" (unless, of course, it is the initial build).

However, the issue I am running in to is that when you convert the document back to a Word document, open in Word and save it, it will "cleanup" the document, and remove all of the comments I've added to it.

So, my question is, is there any way to preserve the comments in the document, or is there any special tags I could add to the XML that would not be visible during standard view/edit of the document, but would not be removed by Word upon a save?

Upvotes: 1

Views: 2833

Answers (2)

Fred the Fantastic
Fred the Fantastic

Reputation: 1345

To echo the previous poster, I've found that Microsoft Word does a lot of behind-the-scenes magic when opening/saving files, and wouldn't suggest storing metadata in any xml files.

However, if you aren't above a bit of scripting, it shouldn't be too difficult to accomplish using any number of modules. Here's a basic Python implementation using oodocx, a module that I'm currently developing.

from oodocx import oodocx
from lxml import etree

d = oodocx.Docx('template.docx')
body = d.get_body()
paragraph_to_remove = d.search('Some text here', result_type='paragraph')
body.remove(paragraph_to_remove)
d.save('new document.docx')

Upvotes: 0

edi9999
edi9999

Reputation: 20554

Choosing to edit the document is a good choice imo!

Word does a lot of changes when opening a docx and saving it again, so I would'nt trust this. I don't even know where it would be possible to store hidden and persistent data inside document.xml.

Here's an idea to get what you need using an other technique.

Hello {name}

with name="edi9999" will be replaced by Hello edi9999

{#names}
Hello {name}
{/names}

with names=[{name:"John"},{name:"Mary"},{name:"Jane"}]

will be replaced by:

Hello John
Hello Mary
Hello Jane

Now the trick to comment out a section is to use an empty array.

if names=[]

the output will be an empty string. If you want to uncomment it, use an array with one element.

Inspired by Mustache

How do I build this ?

I have created an implementation of this for Javascript (works on Node and in Browser): https://github.com/edi9999/docxgenjs

There's a demo Here:

http://javascript-ninja.fr/docxgenjs/examples/demo.html

Upvotes: 1

Related Questions