Reputation: 103
I am trying to use preg_match to find urls mentioned inside and tags so that I can replace them with the updated domain name. Right now I am just trying to get the search script for this figured out in href tags so that I can print the urls found. Here is what I have:
$matches = array();
$search="domain.com";
preg_match('|(<a\s*[^>]*href=[\'"]?)|',$prod['value'],$matches);
echo '<p>'.$matches[1].'</p>';
$prod['value']
refers to the content that I am trying to sift through
Upvotes: 1
Views: 7956
Reputation: 6148
$matches = array();
$search="domain.com";
preg_match('|(<a\s*[^>]*href=[\'"]?)|',$prod['value'],$matches);
echo '<p>'.$matches[1].'</p>';
Firstly, $matches
doesn't need to be defined before the preg_match
call. You just have to provide a variable name and PHP
won't so much as throw a notice.
Secondly, $search
doesn't seem to be relevant to the question?..
Third... Bearing in mind that you haven't shown example input I'm going to make an assumption that you actually want preg_match_all
so that you can get a list of all URLs
from the input.
Fourth, following on from three, that means you need var_dump
or print_r
instead of echo
as the content of $matches[X]
will be an array
.
Okay, so now for what your regex pattern actually does...
(<a\s*[^>]*href=['"]?)
(
- starts a capture group<a\s*
- matches <a
followed by 0 or more white space characters[^>]*
- matches 0 or more characters that are not >
href=
- matches href=
['"]?
- optionally matches either '
or "
)
- ends capture groupThis all means that run against the example input your regex will match <a href="
from the first link example (google) and <a class="fancyStyle" href="
from the second link example (youtube).
/**
Output from:
preg_match_all('|(<a\s*[^>]*href=[\'"]?)|', $string, $matches);
var_dump($matches);
*/
array(2) {
[0]=>
array(2) {
[0]=>
string(9) "<a href=""
[1]=>
string(28) "<a class="fancyStyle" href=""
}
[1]=>
array(2) {
[0]=>
string(9) "<a href=""
[1]=>
string(28) "<a class="fancyStyle" href=""
}
}
There are a few problems with your code, but, the one that is stopping you from getting the expected URL
is that you simply stop capturing before you get to it.
The following regex will match URL
s that are within the href
attribute of a
tags.
#<a\s.*?(?:href=['"](.*?)['"]).*?>#is
<a
- matches the opening of an a
tag\s.*?
- matches a white space character followed by any character 0 or more times(?:
- creates a non-capturing grouphref=
- matches href=
['"]
- matches either '
or "
(.*?)
- creates a capture group and matches 0 or more characters before...['"]
- matches '
or "
)
- ends the non-capturing group.*?>
- matches any character 0 or more times followed by >
i
- makes the regex case insensitives
- makes .
match all characters (including new lines)preg_match_all('#<a\s.*?(?:href=[\'"](.*?)[\'"]).*?>#is', $string, $matches);
var_dump($matches);
/**
array(2) {
[0]=>
array(2) {
[0]=>
string(34) "<a href="http://www.google.co.uk">"
[1]=>
string(65) "<a class="fancyStyle" href="http://www.youtube.com" id="link136">"
}
[1]=>
array(2) {
[0]=>
string(23) "http://www.google.co.uk"
[1]=>
string(22) "http://www.youtube.com"
}
}
*/
All code uses the following as input into the preg_match
function...
$string = <<<EOC
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title of page</title>
</head>
<body>
<h1>Main Page title</h1>
<p>
The following is a <a href="http://www.google.co.uk">link to google</a>.
This is <a class="fancyStyle" href="http://www.youtube.com" id="link136">another link</a>
</p>
</body>
</html>
EOC;
Upvotes: 3