Varun kumar
Varun kumar

Reputation: 3039

Regex OR functionality

I have a html content such as follows:

$html = "My name is Varun-Kumar. My webpage is <a href='http://varundeboss.com/varun-home-page'>Varundeboss</a> Also http://varundeboss.home.com/varun-home-page";

Now I want to remove all the occurrence of "-" from the html except if it occurs within the anchor tag and also in the links starting with "http://", "https://" and "www."

I can do this for the anchor tag using the following code:

$result = preg_replace('%-(?![^<]*</a>)%i', '', $html);

Can someone help me how to change this regex to include the case for "http://", "https://" and "www."

Appreciate the help!

Thanks, Varun

Upvotes: 3

Views: 75

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can use this pattern:

$result = preg_replace('~(?:https?:\S+|<a\b[^>]*)(*SKIP)(?!)|-~i', ' ', $html);

The idea is to match what you want to avoid before trying to match the -. Then you make the pattern fail with (?!) that is always false, and you stop the backtracking with (*SKIP)

The advantage of this method is that you can freely choose by what you will replace the target string without using preg_replace_callback():

$result = preg_replace_callback('~(https?:\S+|<a\b[^>]*)|-~i', 
                                function ($m) { return ($m[1])? $m[1] : ' ';},
                                $html);

In these two examples you can easily add that you want: img tags, www, etc.

Upvotes: 1

Related Questions