Reputation: 1210
I'm looking to create a regex which matches links which have no dots at the end. I know a FQDN always has the root dot at the end, but I'm working on a blog service. I need to process blog posts and apparently some useres finish their post with with a link and then a dot to finish their sentence.
Those texts look something like:
Example text... https://example.com/site. More text here...
The problem here is that this doesn't link to any webpage. With the help of this question I made this PHP function:
function modifyText($text) {
$url = '/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/';
$string= preg_replace($url, '<a href="$0" target="_blank">$0</a>', $text);
return $string;
}
With the example from above this code generates
Example text... <a href="https://example.com/site." target="_blank">https://example.com/site.</a> More text here...
but it should generate
Example text... <a href="https://example.com/site" target="_blank">https://example.com/site</a>. More text here...
Upvotes: 1
Views: 944
Reputation: 163457
Another option is to use a negative lookbehind (?<!\.)
after the \S
to assert what is on the left is not a dot:
https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,3}(?:\/\S*(?<!\.))?
If you don't need the capturing groups ()
you could turn them into non capturing groups (?:)
You don't have to escape the forward slash \/
if you use another delimiter than /
for example ~
For example:
function modifyText($text) {
$url = '~https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,3}(?:\/\S*(?<!\.))?~';
$string= preg_replace($url, '<a href="$0" target="_blank">$0</a>', $text);
return $string;
}
echo modifyText("Example text... https://example.com/site. More text here... https://example.com/site");
Result
Example text... <a href="https://example.com/site" target="_blank">https://example.com/site</a>. More text here... <a href="https://example.com/site" target="_blank">https://example.com/site</a>
Upvotes: 2
Reputation: 370989
One option would be to, at the end, lazy-repeat non-space characters, and lookahead for zero or more .
s, followed by a space or the end of the string:
'/https?:\/\/[a-z0-9.-]+\.[a-z]{2,3}(\/\S*?(?=\.*(?:\s|$)))?/i'
https://regex101.com/r/4VEWjW/2
Could also repeat dots followed by non-dots, to avoid being lazy:
'/https?:\/\/[a-z0-9.-]+\.[a-z]{2,3}(\/\.*[^.]+(?=\.*(?:\s|$)))?/i'
Upvotes: 1