Reputation: 3
I'm looking for a regular expression that detects whether or not a string is anything more than a bunch of HTML tags.
So, desired functionality is:
Input -> Output
"<html></html>" -> False
"<html>Hi</html>" -> True
"<a href='google.com'>Click Me</a>" -> True
"hello" -> True
"<bold><italics></bold></italics>" -> False
"" -> Don't care
Once upon a time I could have done this myself, but it's been too long.
Thanks in advance.
edit: I don't care if they are real HTML tags. Lets call anything inside <>'s a tag. Also don't care if a start tag matches up with an end tag.
Upvotes: 0
Views: 1264
Reputation: 36035
I once used this to strip out html tags:
const string tagsPatterns = "\\s*<.*?>\\s*";
value = System.Text.RegularExpressions.Regex.Replace(value, tagsPatterns, " ");
I guess you can play with it a bit (this version wanted to keep white spaces), to get the string with no tags, and check if it isn't empty
Update 1: Here it goes :)
bool HasText(string value)
{
const string tagsPatterns = "<.*?>";
value = System.Text.RegularExpressions.Regex.Replace(value, tagsPatterns, "");
return value.Trim() != "";
}
[TestMethod]
public void TestMethod2()
{
Assert.IsFalse(HasText("<html></html>"));
Assert.IsTrue(HasText("<html>Hi</html>"));
Assert.IsTrue(HasText("<a href='google.com'>Click Me</a>"));
Assert.IsTrue(HasText("hello"));
Assert.IsFalse(HasText("<bold><italics></bold></italics>"));
Assert.IsFalse(HasText(""));
}
Upvotes: 0
Reputation: 12206
Here's an article written by Phil Haack about using a regular express to match html.
Also, if you want a simple line of code, consider loading the string into an XmlDocument. It would parse it so you'll know if you have valid xml or not.
Upvotes: 0
Reputation: 338416
Replace "<[^>]*>"
with the empty string, trim the result and check if there is anything left afterwards.
Upvotes: 2