Reputation: 33
I'm missing somethings that make me fail on using recursive (?R).
An example to explain my problem 'clearly':
$str1 = "somes text -start bla bla FIND bla bla bla FIND bla FIND bla end-";
$str2 = "somes text -start bla bla FIND bla bla bla FIND bla FIND bla end-";
$my_pattern = "-start .*(FIND).* end-";
preg_replace_callback($my_pattern, 'callback', $str1.$str2);
It will only match the very last FIND.
With the 'ungreedy' option i'll match the 1st FIND of both $str.
But how can i get all of them ? I tried to used '(?R)' but i dont really understand how it work.
Thank.
EDIT: The real work is to find all the 'title' property betweem <a>
& </a>
.
I know it's not optimise to use regex to parse html but it's just a work from school to learn regex.
That's why i didnt put the real work, i wanted to understand and be able to do it myself.
<html>
<head><title>Nice page</title></head>
<body>
Hello World
<a href=http://cyan.com title="a link">
this is a link
</a>
<br />
<a href=http://www.riven.com> Here too <img src=wrong.image title="and again">
<span>Even that<div title="same">all the same</div></span>
</a>
</body>
</html>
My job is too put every titles in uppercase (title="A LINK" for example) using regex.
My last pattern was:
#<a .* title=\"(.*)\".*</a>#Uis
Made me catch (title="a link") and (title="and again"). Your method should work (stribizhev) but i didnt succeed to implement it, i'm still on it.
Upvotes: 1
Views: 163
Reputation: 626728
You need to use DOMDocument with DOMXPath to safely get all title attributes and change them with mb_strtoupper
:
$html = "<<YOUR_HTML>>";
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$titles = $xpath->query('//a[@title]');
foreach($titles as $title) {
$title->setAttribute("title", mb_strtoupper($title->getAttribute("title"), 'UTF-8'));
}
echo $dom->saveHTML();
See IDEONE demo.
The //a[@title]
xpath gets <a>
elements (a
) with an attribute title
.
I use mb_strtoupper
assuming you have UTF8 input. Please adjust accordingly, or if you are not planning to use Unicode, just use strtoupper
.
Here is a regex that will let you replace all FIND
substrings inside the -start
and -end
:
(-start|(?!^)\G)(.*?)FIND(?=.*end-)
See demo
Replace with $1$2NEW_WORD
.
$re = "#(-start|(?!^)\G)(.*?)FIND(?=.*end-)#";
$str = "somes text -start bla bla FIND bla bla bla FIND bla FIND bla end-";
$subst = "$1$2NEW_WORD";
$result = preg_replace($re, $subst, $str);
echo $result;
NOTE: If you have several start-end
blocks, you will most probably need a tempered greedy token (?:(?!-start|end-|FIND).)*
instead of .*?
and .*
.
The regex breakdown:
(-start|(?!^)\G)
- This group contains two alternatives:
-start
- matches the literal string -start
(?!^)\G
- asserts the position in the original input string right after the last successful match. \G
can also assert the beginning of the string, but we exclude it with the negative look-ahead.(.*?)
- Match any number of characters but as few as possibleFIND
- literal string FIND
(?=.*end-)
- only if there is literal string end-
after the FIND
.For more information on \G
operator, see When is \G useful application in a regex? and What good is \G in a regular expression?.
Upvotes: 1
Reputation: 1
If using preg_replace_callback
why wouldn't reluctant .*?
be convenient.
$my_pattern = "/-start(.*?)end-/s";
$str = preg_replace_callback($my_pattern, function($matches) {
return str_replace("FIND", "<b>FIND</b>", $matches[0]);
}, $str1.$str2);
Or do something else in callback. What are you trying to achieve?
Upvotes: 0