Joshua Soileau
Joshua Soileau

Reputation: 3025

Using Regex to Target HREF Attribute

I'm very new to regex.

I want to target everything in between the quotes in href="", so that I can quickly parse html and replace the contents of link references.

I also want to be able to do this with img src attributes, but if someone can explain how to do it with href, I will be able to do other attributes in the same way.

If I have this markup:

<a href="http://my.domain/simple-product-2.html" class="product-image"><img src="http://my.domain/media/catalog/product/cache/1/small_image/75x/9df78eab33525d08d6e5fb8d27136e95/images/catalog/product/placeholder/small_image.jpg" width="75" height="75" alt="Simple Product 2" title="Simple Product 2"></a>
<div class="product-details">
    <h3 class="product-name"><a href="http://my.domain/simple-product-2.html">Simple Product 2</a></h3>
    <div class="price-box">
        <span class="regular-price" id="product-price-2-related">
        <span class="price">$42.00</span>                                    </span>
    </div>
    <p><a href="http://my.domain/wishlist/index/add/product/2/form_key/PLOSE4N7mH4kcOgX/" class="link-wishlist">Add to Wishlist</a></p>
</div>

How do I use regex to target any of the values between "" in something like an href??

Edit: expected output as an example:

Given this input

href="http://my.domain/simple-product-2.html"

Retrieve this output:

href="http://index.html"

Upvotes: 0

Views: 262

Answers (2)

Braj
Braj

Reputation: 46871

I want to target everything in between the quotes in href=""

Get the matched group from index 1 using possessive quantifiers as suggested by @lcoderre in below comments.

href="([^"]*+)"

Here is online demo


Try this one as well using Positive Lookbehind & Lookahead

(?<=href=").*?(?=")

Online demo


Sample code with first regex pattern:

$re = "/href=\\"([^\\"]*+)\\"/m";
$str = ...

preg_match_all($re, $str, $matches);

Upvotes: 1

anubhava
anubhava

Reputation: 786091

Do not use regex for parsing HTML. Use DOM parser in PHP:

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML( $html ); // loads your html

$nodelist = $doc->getElementsByTagName('a'); // get all the <a> tags
for($i=0; $i < $nodelist->length; $i++) {
    $node = $nodelist->item($i);
    $val = $node->attributes->getNamedItem('href')->nodeValue;
    echo "href is: $val\n";
}

Upvotes: 4

Related Questions