Reputation: 3039
I have a html content such as follows:
$html = "My name is Varun-Kumar. My webpage is <a href='http://varundeboss.com/varun-home-page'>Varundeboss</a> Also http://varundeboss.home.com/varun-home-page";
Now I want to remove all the occurrence of "-" from the html except if it occurs within the anchor tag and also in the links starting with "http://", "https://" and "www."
I can do this for the anchor tag using the following code:
$result = preg_replace('%-(?![^<]*</a>)%i', '', $html);
Can someone help me how to change this regex to include the case for "http://", "https://" and "www."
Appreciate the help!
Thanks, Varun
Upvotes: 3
Views: 75
Reputation: 89557
You can use this pattern:
$result = preg_replace('~(?:https?:\S+|<a\b[^>]*)(*SKIP)(?!)|-~i', ' ', $html);
The idea is to match what you want to avoid before trying to match the -
. Then you make the pattern fail with (?!)
that is always false, and you stop the backtracking with (*SKIP)
The advantage of this method is that you can freely choose by what you will replace the target string without using preg_replace_callback()
:
$result = preg_replace_callback('~(https?:\S+|<a\b[^>]*)|-~i',
function ($m) { return ($m[1])? $m[1] : ' ';},
$html);
In these two examples you can easily add that you want: img tags, www, etc.
Upvotes: 1