Replace some char within a string (XML format)

Question

I was given with a String variable with the following content:



<Content content="bla bla bla... by <1% to ??? on other bla bla...." />
</main>
</code></pre>

<p>This string will eventually passed to a Stored Procedure for XQuery.</p>

<p>As you can see, the content of "Content" contains of char <strong>"<"</strong> , which when I try to parse in Stored Procedure, it return with an error.</p>

<p>My question is how to convert the <strong>"<"</strong> into <strong>< ;</strong> (in this case <strong><1%</strong> to <strong>< ;1%</strong>) in an efficient way.</p>

<p>I want to retain other <strong>"<"</strong> as it is.</p>

<p>Tks</p>

Dai · Accepted Answer

Since you updated your question to point out you are dealing with XML, but the unencoded values are in attribute values, not #text nodes, then it makes it somewhat simpler, just extract the attribute value using a similar approach to my previous answer, then use a library function to entitize it, then output.

Note that CDATA only applies to #text, not attributes.

String doc =
@"

<Content content=""bla bla bla... by <1% to ??? on other bla bla...."" />
</main>";

Int32 contentOpenStart = doc.IndexOf("<Content");
Int32 contentAttribContentValueStart = doc.IndexOf("content=\"", contentOpenStart) + "content=\"".Length;
Int32 contentAttibContentValueEnd    = doc.IndexOf("\"", contentAttribContentValueStart);

String attributeValueOld = doc.Substring( contentAttribContentValueStart, contentAttibContentValueEnd );
String attributeValueNew = System.Net.WebUtility.HtmlEncode( attributeValueOld );

String doc2 = String.Concat(
    doc.Substring( 0, contentAttribContentValueStart );
    attributeValueNew,
    doc.Substring( contentAttibContentValueEnd );
);
</code></pre>

<p><code>doc2</code> then contains the fixed attribute value.</p>

<p>Note that using <code>HtmlEncode</code> to perform HTML-Encoding of entities is not strictly correct in XML, as the set of XML entities is much smaller than HTML's - indeed, XML is only concerned with <code>&</code>, <code>></code>, <code><</code>, <code>"</code> and <code>'</code>, all other values should be in the document as raw/native characters.</p>

Replace some char within a string (XML format)

Answers (2)

Related Questions