Reputation: 67213
I am parsing some HTML source. Is there a regex script to find out whether alt tags in a html document are empty?
I want to see if the alt tags are empty or not.
Is regex suitable for this or should I use string manipulation in C#?
Upvotes: 0
Views: 698
Reputation: 64919
Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.
Upvotes: 0
Reputation: 5480
You have to parse the HTML and check tags, use the following link, it includes a C# library for parsing HTML tags, and you can loop through tags and get the number of tags: Parsing HTML tags.
Upvotes: 2
Reputation: 12616
If you want to do it just looking at the page then CSS selectors might be better, assuming your browser supports the :not selector.
Install the selectorgadget bookmarklet. Activate it on your page and then put the following selector in the intput box and press enter.
img:not([alt])
If you are automating it, and have access to the DOM for the HTML you could use the same selector.
Upvotes: 0
Reputation: 25775
If this is valid XHTML, why do you need Regex at all? If you simply search for the string:
alt=""
... you should be able to find all empty alt
tags.
In any case, it shouldn't be too complicated to construct a Regex for the search too, taking into account poorly written HTML markup (especially with spaces):
alt\s*=\s*"\s*"
Upvotes: 0