Reputation: 35544
I´m trying to create reports from .docx-Templates using the Open XML SDK 2.5. Within the templates I have defined some placeholders that get replaced by real values. The placeholders can be defined in various schemas such as
<#Name#>
or
<!#Name#!>
or
#Name#
or
{{Name}}
The schema of the placeholder can also be in another format, as long as the placeholders can be clearly identified within the text.
The problem I am currently facing is that a placeholder is often split among multiple <w:t>
-Elements (DocumentFormat.OpenXml.Wordprocessing.Text
) within an <w:p>
-Element (DocumentFormat.OpenXml.Wordprocessing.Paragraph
). An example
<w:p w:rsidR="003137E0" w:rsidRDefault="008C62F1" w:rsidP="00D43D55">
<w:r>
<w:t xml:space="preserve">#FirstName# </w:t>
</w:r>
<w:r w:rsidR="00C93A70">
<w:t>#LastName</w:t>
</w:r>
<w:r w:rsidR="005F49B7">
<w:t>#</w:t>
</w:r>
</w:p>
Here the placeholder #FirstName#
is easily identifyable, cause it is within one <w:t>
-Element, but the placeholder #LastName#
is split among multiple <w:t>
-Elements, so that I cannot use a simple Regex on the Text on the Document like
Regex placeholderRegex = new Regex(@"#[\w]*#");
document.MainDocumentPart.Document.Body.Descendants<Text>().Where(t=> placeholderRegex.IsMatch(t.Text))
I have no control how the templates get defined and I also will not put constraints on the Users how they have to create the template. For me it is also not clear when a placeholder gets split into multiple <w:t>
-Elements.
Another example using {{[\w]*}}
as schema for placeholders.
Text (Docx)
{{Ort}}
And this {{placeholder}} is within the text
Xml (OpenXML)
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
<w:body>
<w:p w:rsidR="007B60F2" w:rsidRDefault="00BB7370" w:rsidP="00D43D55">
<w:pPr>
<w:rPr>
<w:lang w:val="en-US" />
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00114EA7">
<w:rPr>
<w:lang w:val="en-US" />
</w:rPr>
<w:t>{{</w:t>
</w:r>
<w:r w:rsidR="00C93A70" w:rsidRPr="00114EA7">
<w:rPr>
<w:lang w:val="en-US" />
</w:rPr>
<w:t>Ort</w:t>
</w:r>
<w:r w:rsidR="00114EA7" w:rsidRPr="00114EA7">
<w:rPr>
<w:lang w:val="en-US" />
</w:rPr>
<w:t>}}</w:t>
</w:r>
</w:p>
<w:p w:rsidR="00EC3BED" w:rsidRPr="00114EA7" w:rsidRDefault="00C310E0" w:rsidP="00D43D55">
<w:pPr>
<w:rPr>
<w:lang w:val="en-US" />
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00114EA7">
<w:rPr>
<w:lang w:val="en-US" />
</w:rPr>
<w:t xml:space="preserve">This is a text with a </w:t>
</w:r>
<w:r w:rsidR="00A07A5D">
<w:rPr>
<w:lang w:val="en-US" />
</w:rPr>
<w:t>{{</w:t>
</w:r>
<w:r w:rsidRPr="00114EA7">
<w:rPr>
<w:lang w:val="en-US" />
</w:rPr>
<w:t>placeholder</w:t>
</w:r>
<w:r w:rsidR="00A07A5D">
<w:rPr>
<w:lang w:val="en-US" />
</w:rPr>
<w:t>}</w:t>
</w:r>
<w:r w:rsidR="00114EA7" w:rsidRPr="00114EA7">
<w:rPr>
<w:lang w:val="en-US" />
</w:rPr>
<w:t>}</w:t>
</w:r>
<w:bookmarkStart w:id="0" w:name="_GoBack" />
<w:bookmarkEnd w:id="0" />
<w:r w:rsidR="00A07A5D">
<w:rPr>
<w:lang w:val="en-US" />
</w:rPr>
<w:t>.</w:t>
</w:r>
</w:p>
<w:sectPr w:rsidR="00EC3BED" w:rsidRPr="00114EA7" w:rsidSect="00237721">
<w:pgSz w:w="11906" w:h="16838" />
<w:pgMar w:top="1417" w:right="1417" w:bottom="1134" w:left="1417" w:header="708" w:footer="708" w:gutter="0" />
<w:cols w:space="708" />
<w:docGrid w:linePitch="360" />
</w:sectPr>
</w:body>
</w:document>
So my question is now whats the way to search and replace placeholders with values using Open XML SDK? Is there some functionality within the SDK that can help me? Has anybody else solved this problem and provide assistance?
Upvotes: 9
Views: 3519
Reputation: 3625
I would do this with something like this (not tested but I think this will help you):
List<string> placeHolders = new List<string>();
//load xml string
var doc = XDocument.Parse(xml);
//or to load from file use XDocument.Load("path_to_xml_file.xml");
//get all <w:p> element
var wpElements = doc.Root.Elements("w:p");
foreach (var wp in wpElements)
{
var wrElements = wp.Descendants("w:r");
foreach (var wr in wrElements)
{
var wt = (string)wr.Element("w:t");
if (wt.IsMatch(@"\w")) { //add the string to placeHolders if word is found
placeHolders.Add(wt);
}
else
{
//if not found a word, add it to the last placeHolder,
placeHolder[placeHolder.Count - 1] = placeHolder[placeHolder.Count - 1] + wt;
}
}
}
Upvotes: 3
Reputation: 15878
Please see docx4j does not replace variables for a link to Java source code which solves the problem.
You could implement something similar in C#, or use that code via http://www.nuget.org/packages/docx4j.NET/3.0.1
Upvotes: 3
Reputation: 11
Yes, MS Word application splits even single word into multiple Run/Text elements (for some reason). And no, there is no Find/Replace functionality provided within the Open XML SDK functionality. But you can create your own for the simplest Paragraph/Run/Text structure. You will need to:
Upvotes: 2