Word vsto get text of document with hidden characters

Question

I'm developing a text analysis vsto add-in for Word. Therefore I get the text of the active document like this:

Globals.ThisAddin.Apllication.ActiveDocument.Content.Text

After that I analyze it. The analysis returns a list of positions that Word should comment (like character 3 - 6 and character 10 - 13).

The problem is that it seems like the comment from 3 to 6 is adding a character (that is hidden) to the document. Because all comments that Word is doing after the first one are placed one character too early.

Is there a way how I can fix that or how I can get the text with the hidden characters?

I found TextRetrievalMode but I can not get it working with that.

Cindy Meister · Accepted Answer

Basically, the answer is "No, you can't do it the way you propose."

Yes, Word does add "hidden characters" to the text flow that cannot be picked up using the object model. Trying to work with character index values is not going to work reliably. The reliable method is Word's built-in Find/Replace with wildcards. If RegEx is absolutely necessary, then some kind of Find/Replace within a character-index range (say, starting 5 characters before and ending 5 characters after the indices computed using RegEx) might be a way to double-check the result and pick up the correct Range.

Possibly, depending on what kind of analysis this is, it might be better to work with the closed file, leveraging the Office Open XML. That will not have the problem of "hidden characters" that Word uses for structural information. On the other hand, there's a lot of formatting information that breaks up text runs that needs to be contended with...

Word vsto get text of document with hidden characters

Answers (1)

Related Questions