Afshin Mansouri
Afshin Mansouri

Reputation: 39

php regex get custom url and string inside href tag

I get content of page like this :

$html = file_get_contents('example.ir');

Now I want get href tags inside $html where have to be custom url + string ;

for example i have three href :

1- href="http://example.ir/salam/ali/...."  => http://example.ir/ + salam/ali/....
2- href="http://example.ir/?id=123/..."     => http://example.ir/ + ?id=123/...
3- href="?kambiz=khare/..."                 => ?kambiz=khare/...

I want number 1 and 2 because have http://example.ir + some string.

Resault have to be like these :

1- http://example.ir/salam/ali/....
2- http://example.ir/?id=123/...

Help me plz :)

Upvotes: 2

Views: 1496

Answers (1)

Ro Yo Mi
Ro Yo Mi

Reputation: 15000

Description

This regex will capture the anchor tags providing they have an href attribute whose value starts with http://example.ir/. It will then capture the entire href value into capture group 1.

<a\b(?=\s) # capture the open tag
(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\shref="(http:\/\/example\.ir\/[^"]*))  # get the href attribute
(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*"\s?> # get the entire  tag
.*?<\/a>

enter image description here

Example

Sample Text

Note the last line has a potentially difficult edge case.

<a href="http://example.ir/salam/ali/....">salam ali</a>
<a class="Fonzie" href="http://example.ir/?id=123/...">plus id 123</a>
<a class="Fonzie" href="?kambiz=khare/...">not an http</a>
<a onmouseover=' href="http://example.ir/salam/ali/...." ; funHrefRotater(href) ; " href="?kambiz=khare/...">again not the line we are looking for</a>

Code

This PHP example is to only show that how the match works.

<?php
$sourcestring="your source string";
preg_match_all('/<a\b(?=\s) # capture the open tag
(?=(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"][^\s>]*)*?\shref="(http:\/\/example\.ir\/[^"]*)) # get the href attribute
(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"\s]*)*"\s?> # get the entire tag
.*?<\/a>/imx',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
 

Matches

[0][0] = <a href="http://example.ir/salam/ali/....">salam ali</a>
[0][1] = http://example.ir/salam/ali/....
[1][0] = <a class="Fonzie" href="http://example.ir/?id=123/...">plus id 123</a>
[1][1] = http://example.ir/?id=123/...

Upvotes: 2

Related Questions