GurdeepS
GurdeepS

Reputation: 67213

Getting alt tags with regex

I am parsing some HTML source. Is there a regex script to find out whether alt tags in a html document are empty?

I want to see if the alt tags are empty or not.

Is regex suitable for this or should I use string manipulation in C#?

Upvotes: 0

Views: 698

Answers (4)

Chas. Owens
Chas. Owens

Reputation: 64919

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

Upvotes: 0

Ahmy
Ahmy

Reputation: 5480

You have to parse the HTML and check tags, use the following link, it includes a C# library for parsing HTML tags, and you can loop through tags and get the number of tags: Parsing HTML tags.

Upvotes: 2

Sam Hasler
Sam Hasler

Reputation: 12616

If you want to do it just looking at the page then CSS selectors might be better, assuming your browser supports the :not selector.

Install the selectorgadget bookmarklet. Activate it on your page and then put the following selector in the intput box and press enter.

img:not([alt])

If you are automating it, and have access to the DOM for the HTML you could use the same selector.

Upvotes: 0

Cerebrus
Cerebrus

Reputation: 25775

If this is valid XHTML, why do you need Regex at all? If you simply search for the string:

alt=""

... you should be able to find all empty alt tags.

In any case, it shouldn't be too complicated to construct a Regex for the search too, taking into account poorly written HTML markup (especially with spaces):

alt\s*=\s*"\s*"

Upvotes: 0

Related Questions