Thomas Anderson
Thomas Anderson

Reputation: 69

Regex help, data from an attribute within a subtag of a tag

so i'd just like to quickly put out there that regex is a suitable solution for this problem, the html it is parsing is and will always be formatted the same.

The particular piece of html I am interested in parsing looks similar to the following

<a href="" target="" onCick=""><img style="" onmouseover="" onmouseout="" src="" alt="" /></a>

I am interested in pulling the 'src' and 'alt' tags out of that string. Regex really confuses me to the point that I don't really understand what i'm doing with it. so real help would be appreciated. Would mean alot, thanks.

Upvotes: 1

Views: 79

Answers (1)

AKX
AKX

Reputation: 168966

Which language are you using? Regexp dialects have some minor differences.

Either way, for JavaScript you could use

var match = /src="(.*?)"\s+alt="(.*?)"/.exec(pieceOfHTML);
// match[1] should be the src, match[2] the alt

or for Python,

match = re.search(r'src="(.*?)"\s+alt="(.*?)', pieceOfHTML)
# match.group(1) and match.group(2) respectively

EDIT re comments:

<a href=".*?"\s+target=".*?"\s+onCick=".*?"><img style=".*?"\s+onmouseover=".*?" onmouseout=".*?"\s+src="(.*?)"\s+alt="(.*?)"

should be a decent regexp to match only the pattern required, with lenience regarding whitespace.

Upvotes: 1

Related Questions