Reputation: 3252
This question shows my ignorance of regular expressions. I've never understood it quite enough.
If I wanted to match, for instance, just the URL portion of an tag in HTML, what would I need to do?
My regular expression to get the entire tag is:
<A[^>]*?HREF\s*=\s*[""']?([^'"" >]+?)[ '""]?>
I have no idea what I would need to do to get the URL out of that and I have no clue where to look in regular expression documentation to figure this out.
Upvotes: 1
Views: 182
Reputation: 351476
I switched things up a bit - try something like this:
<a[^>]*href="([^"]*).*>
Upvotes: 0
Reputation: 13056
You can use round brackets to group parts of the regular expression match. In this case you could use a round bracket around the URL part and then later use a number to refer to that group. See here to see how exactly you can do this.
Upvotes: 1
Reputation: 156138
the exactly HOW part depends on the regex library you're using, but the way is to use a grouped expression. You actually already have one in your example, as grouped expressions are parenthesized. The href attribute value is your first group (your zeroth group is the whole expression.)
Upvotes: 2
Reputation: 4423
If programming in Perl you could utilize the $1 operator within an if() statement. For ex.
if( $HREF =~ /<A[^>]*?HREF\s*=\s*[""']?([^'"" >]+?)[ '""]?>/ ) {
print $1;
}
Upvotes: 3