Preeti
Preeti

Reputation: 47

how to remove invalid characters from xml in C#

I have a xml which consists of lot of text like below:

<EmployeeId>&EmpId;</EmployeeId>
<Department>&Dept;</Department>

I need to remove the & character, so that it will be proper xml to validate it against the xsd. How can achieve this?

Upvotes: 2

Views: 10089

Answers (5)

FranckPom
FranckPom

Reputation: 71

You can also use

xml = "<myxmlnode>"+ SecurityElement.Escape(string) +"</myxmlnode>" when creating your nodes.

This will take care of all the invalid characters ans escape them.

Upvotes: 5

SoftArtisans
SoftArtisans

Reputation: 536

I suggest taking a look at the XMLConvert class. You can use it to encode and Decode XML to make it safe. This will also handle illegal characters as defined in the XML Spec. As pointed out already, removing text such as &amp would actually change the underline data, therefore you should really encode and decoded as needed.

Upvotes: 0

Jon Raynor
Jon Raynor

Reputation: 3892

Some characters need to be encoded properly if they are used in XML. The ampersand & is one of them.

Have a look here, you will need to encode these characters if they are a part of your data contained in the XML.

http://support.microsoft.com/kb/316063

Upvotes: 0

Christoph Fink
Christoph Fink

Reputation: 23113

You could do one of the following:

string content = System.IO.File.ReadAllText("PATH");
System.IO.File.WriteAllText("PATH", content.Replace("&", String.Empty));

or

string content = System.IO.File.ReadAllText("PATH");
System.IO.File.WriteAllText("PATH", content.Replace("&amp;", "&").Replace("&", "&amp;"));

The "double Replace" is to avoid creating "&amp;amp;".

Upvotes: 0

C.Evenhuis
C.Evenhuis

Reputation: 26446

I wouldn't recommend removing data just to "fix" an issue. The correct way to add the & character to Xml data is &amp;. You could use the XmlWriter or some other class from the framework to create Xml and let it figure out the formatting for you.

Upvotes: 3

Related Questions