Charlie Salts
Charlie Salts

Reputation: 13488

Strip Comments from XML

I've encountered the need to remove comments of the form:

<!--  Foo

      Bar  -->

I'd like to use a regular expression that matches anything (including line breaks) between the beginning and end 'delimiters.'

What would a good regex be for this task?

Upvotes: 3

Views: 864

Answers (5)

Contango
Contango

Reputation: 80192

Here is some complete sample code to read an XML file in, and return a string which is the file with no comments.

var text = File.ReadAllText("c:\file.xml");
{ 
  const string strRegex = @"<!--(?:[^-]|-(?!->))*-->";
  const RegexOptions myRegexOptions = RegexOptions.Multiline;
  Regex myRegex = new Regex(strRegex, myRegexOptions);
  string strTargetString = text;
  const string strReplace = @""; 

  string result = myRegex.Replace(strTargetString, strReplace);
  return result;
}

Unfortunately, RegexOptions.Multiline alone will not do the trick (which is slightly counterintuitive).

Upvotes: 0

Anonymous
Anonymous

Reputation:

Parsing XML with regex is considered bad style. Use some XML parsing library.

Upvotes: 0

Diadistis
Diadistis

Reputation: 12174

The simple way :

Regex xmlCommentsRegex = new Regex("<!--.*?-->", RegexOptions.Singleline | RegexOptions.Compiled);

And a better way :

Regex xmlCommentsRegex = new Regex("<!--(?:[^-]|-(?!->))*-->", RegexOptions.Singleline | RegexOptions.Compiled);

Upvotes: 5

Chris Nava
Chris Nava

Reputation: 6802

The 'proper' way would be to use XSLT and copy everything but comments.

Upvotes: 4

yogman
yogman

Reputation: 4131

NONE. It cannot be described by the context free grammar, which the regular expression is based upon.

Let's say this thread is exported in XML. Your example (<!-- FOO Bar -->), if enclosed in CDATA, will be lost, while it's not exactly a comment.

Upvotes: 6

Related Questions