Reputation: 13488
I've encountered the need to remove comments of the form:
<!-- Foo
Bar -->
I'd like to use a regular expression that matches anything (including line breaks) between the beginning and end 'delimiters.'
What would a good regex be for this task?
Upvotes: 3
Views: 864
Reputation: 80192
Here is some complete sample code to read an XML file in, and return a string which is the file with no comments.
var text = File.ReadAllText("c:\file.xml");
{
const string strRegex = @"<!--(?:[^-]|-(?!->))*-->";
const RegexOptions myRegexOptions = RegexOptions.Multiline;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = text;
const string strReplace = @"";
string result = myRegex.Replace(strTargetString, strReplace);
return result;
}
Unfortunately, RegexOptions.Multiline
alone will not do the trick (which is slightly counterintuitive).
Upvotes: 0
Reputation:
Parsing XML with regex is considered bad style. Use some XML parsing library.
Upvotes: 0
Reputation: 12174
The simple way :
Regex xmlCommentsRegex = new Regex("<!--.*?-->", RegexOptions.Singleline | RegexOptions.Compiled);
And a better way :
Regex xmlCommentsRegex = new Regex("<!--(?:[^-]|-(?!->))*-->", RegexOptions.Singleline | RegexOptions.Compiled);
Upvotes: 5
Reputation: 6802
The 'proper' way would be to use XSLT and copy everything but comments.
Upvotes: 4
Reputation: 4131
NONE. It cannot be described by the context free grammar, which the regular expression is based upon.
Let's say this thread is exported in XML. Your example (<!-- FOO Bar -->), if enclosed in CDATA, will be lost, while it's not exactly a comment.
Upvotes: 6