GrumpyCrouton
GrumpyCrouton

Reputation: 8621

Regex to ignore matches that have a specific string in front of it

I have this string stored in php:

Keyboard layout codes found here https://msdn.microsoft.com/en-us/library/cc233982.aspx test 123

test https://google.com

test google.com

<img src='http://example.com/pages/projects/uploader/files/2017-06-16%2011_27_36-Settings.png'>Link Converted to Image</img>

The img element was made with a prevous regex;

$url = '~(https|http)?(://)((\S)+(png|jpg|gif|jpeg))~';
$output = preg_replace($url, "<img src='$0'>Link Converted to Image</img>", $output);

My problem is, now I want to convert the regular links to an a element.

I have this regex, which works except for one problem.

$url = '~(https|http)?(://)?((\S)+[.]+(\w*))~';
$output = preg_replace($url, "<img src='$0'>Link Converted to Image</img>", $output);

This regex ALSO converts the link that has already become an img element, so it puts an a element in the source of the img element. My thinking on avoiding this problem is to ignore a preg match checking if the match starts with src=', but I can't figure out how to actually do this.

Am I doing this incorrectly? What is the most common/effecient way to accomplish this?

Upvotes: 1

Views: 244

Answers (2)

revo
revo

Reputation: 48751

Adding to @Jan's answer, although there may be some drawbacks with this workaround, it will match URL-like strings:

<img.+?</img>(*SKIP)(*FAIL)|(?:https?\S+|(?:(?!:)(?(1)\S|(\w)))*\.\w{2,5})

Live demo

Breakdown:

(?:             # Open a NCG (a)
    (?!:)       # Next immediate character shouldn't be a colon `:`
    (?(1)\S     # If CG #1 exists match a non-whitespace character
    |           # otherwise
    (\w))       # Match a word character (a URL begins with a word character)
)*              # As much as possbile (this cluster denotes a tempered pattern)
\.\w{2,5}       # Match TLD

Drawbacks:

  1. TLD's character limit
  2. Partial match of URLs containing a port number

Upvotes: 1

Jan
Jan

Reputation: 43189

A good example for (*SKIP)(*FAIL):

<img.+?</img>(*SKIP)(*FAIL) # match <img> tags and throw them away
|                           # or
\bhttps?\S+\b               # a link starting with http/https


In PHP:

<?php

$string = <<<DATA
Keyboard layout codes found here https://msdn.microsoft.com/en-us/library/cc233982.aspx test 123

test https://google.com

<img src='http://example.com/pages/projects/uploader/files/2017-06-16%2011_27_36-Settings.png'>Link Converted to Image</img>
DATA;

$regex = '~<img.+?</img>(*SKIP)(*FAIL)|\bhttps?\S+\b~';

$string = preg_replace($regex, "<a href='$0'>$0</a>", $string);
echo $string;
?>

Upvotes: 2

Related Questions