Ian Fako
Ian Fako

Reputation: 1210

Regular expression for links with no dot at the end

I'm looking to create a regex which matches links which have no dots at the end. I know a FQDN always has the root dot at the end, but I'm working on a blog service. I need to process blog posts and apparently some useres finish their post with with a link and then a dot to finish their sentence.

Those texts look something like:

Example text... https://example.com/site. More text here...

The problem here is that this doesn't link to any webpage. With the help of this question I made this PHP function:

function modifyText($text) {
    $url = '/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/';
    $string= preg_replace($url, '<a href="$0" target="_blank">$0</a>', $text);
    return $string;
}

With the example from above this code generates

Example text... <a href="https://example.com/site." target="_blank">https://example.com/site.</a> More text here...

but it should generate

Example text... <a href="https://example.com/site" target="_blank">https://example.com/site</a>. More text here...

Upvotes: 1

Views: 944

Answers (2)

The fourth bird
The fourth bird

Reputation: 163457

Another option is to use a negative lookbehind (?<!\.) after the \S to assert what is on the left is not a dot:

https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,3}(?:\/\S*(?<!\.))?

Regex demo | Php demo

If you don't need the capturing groups () you could turn them into non capturing groups (?:)

You don't have to escape the forward slash \/ if you use another delimiter than / for example ~

For example:

function modifyText($text) {
    $url = '~https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,3}(?:\/\S*(?<!\.))?~';
    $string= preg_replace($url, '<a href="$0" target="_blank">$0</a>', $text);
    return $string;
}

echo modifyText("Example text... https://example.com/site. More text here... https://example.com/site");

Result

Example text... <a href="https://example.com/site" target="_blank">https://example.com/site</a>. More text here... <a href="https://example.com/site" target="_blank">https://example.com/site</a>

Upvotes: 2

CertainPerformance
CertainPerformance

Reputation: 370989

One option would be to, at the end, lazy-repeat non-space characters, and lookahead for zero or more .s, followed by a space or the end of the string:

'/https?:\/\/[a-z0-9.-]+\.[a-z]{2,3}(\/\S*?(?=\.*(?:\s|$)))?/i'

https://regex101.com/r/4VEWjW/2

Could also repeat dots followed by non-dots, to avoid being lazy:

'/https?:\/\/[a-z0-9.-]+\.[a-z]{2,3}(\/\.*[^.]+(?=\.*(?:\s|$)))?/i'

Upvotes: 1

Related Questions