user234702
user234702

Reputation: 229

Replace >, <, & chars that appear inside XML nodes

Regular expression to match ">", "<", "&" chars that appear inside XML nodes

I have an almost indentical problem to this - however, I am using C#.

I'm not here to argue the validity of the XML.

What gets sent in is out of my control.

Input XML:

<PNODE> 
  <CNODE>This string contains > and < and & chars.</cnode> 
</PNODE> 

I need it to look like this:

<PNODE> 
  <CNODE>This string contains &gt; and &lt and &amp; chars.</CNODE> 
</PNODE> 

It looks like the guy found a solution for PHP- which doesn't help me.

However, I need to find a way escape the &, > and < characters inside the node, but leave the tag declarations alone.

Upvotes: 2

Views: 9937

Answers (6)

LarsH
LarsH

Reputation: 28004

I'm not here to argue the validity of the XML.

As with that other question, the right answer is that what you got sent is not XML. It's a question of well-formedness, not a question of validity in the XML sense.

What gets sent in is out of my control.

That may be true, but if someone sent you a quart of used motor oil and asked you to transform it into HTML, would you still accept it? Usually data interchange is done based on a contract (formal or informal), that the interchanged data will adhere to certain criteria. If it doesn't live up to the agreed-upon criteria, the data can be sent back, rejected.

If you're not requiring XML as input, this question is not about "<, & chars that appear inside XML nodes". Rather, it's about parsing SGML that looks a lot like XML, but which has < and & chars that appear in text content.

And to do that, .NET Tidy and SGMLReader are good solutions, as others have said.

Upvotes: 0

Zippit
Zippit

Reputation: 1683

I've always just used replace for XML (saves me having to bring in HTTP libraries):

string output = inputXml.Replace("&", "&amp;")
                        .Replace("<", "&lt;")
                        .Replace(">", "&tg;")
                        .Replace("'", "&apos;")     // optional
                        .Replace("\"", "&Quot;")    // optional

Upvotes: 0

Lasse Espeholt
Lasse Espeholt

Reputation: 17792

You should have a look at SgmlReader:

http://developer.mindtouch.com/SgmlReader

It will give you exactly what you wants :) I use it here: http://www.xmltools.dk/HtmlToXml try it :) (you can disable the html tag and the uppercase-tags->lowercase-tags conversion.)

Upvotes: 0

Nathan Wheeler
Nathan Wheeler

Reputation: 5932

Check out Tidy.Net. It's a .Net implementation of Tidy.

Upvotes: 1

mledbetter
mledbetter

Reputation: 88

Use the HTTPUtility.

HttpUtility.HtmlEncode("<text to Encode>");

Upvotes: 0

Kevin LaBranche
Kevin LaBranche

Reputation: 21088

There's a couple of .Net wrappers around the tidy library.

http://users.rcn.com/creitzel/tidy.html#dotnet

http://www.codeproject.com/KB/mcpp/eftidynet.aspx

And there is a .Net Port of tidy.

Upvotes: 0

Related Questions