Brian Leishman
Brian Leishman

Reputation: 8575

PHP URL to Link with Regex

I know I've seen this done a lot in places, but I need something a little more different than the norm. Sadly When I search this anywhere it gets buried in posts about just making the link into an html tag link. I want the PHP function to strip out the "http://" and "https://" from the link as well as anything after the .* so basically what I am looking for is to turn A into B.

A: http://www.youtube.com/watch?v=spsnQWtsUFM
B: <a href="http://www.youtube.com/watch?v=spsnQWtsUFM">www.youtube.com</a>

If it helps, here is my current PHP regex replace function.

ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]", "<a href=\"\\0\" class=\"bwl\" target=\"_new\">\\0</a>", htmlspecialchars($body, ENT_QUOTES)));

It would probably also be helpful to say that I have absolutely no understanding in regular expressions. Thanks!

EDIT: When I entered a comment like this blahblah https://www.facebook.com/?sk=ff&ap=1 blah I get html like this<a class="bwl" href="blahblah https://www.facebook.com/?sk=ff&amp;ap=1 blah">www.facebook.com</a> which doesn't work at all as it is taking the text around the link with it. It works great if someone only comments a link however. This is when I changed the function to this

preg_replace("#^(.*)//(.*)/(.*)$#",'<a class="bwl" href="\0">\2</a>',  htmlspecialchars($body, ENT_QUOTES));

Upvotes: 6

Views: 8138

Answers (7)

Christoffer
Christoffer

Reputation: 1

The code with regex does not work completely.

I made this code. It is much more comprehensive, but it works:

See the result here: http://cht.dk/data/php-scripts/inc_functions_links.php

See the source code here: http://cht.dk/data/php-scripts/inc_functions_links.txt

Upvotes: 0

daalbert
daalbert

Reputation: 1475

$result = preg_replace('%(http[s]?://)(\S+)%', '<a href="\1\2">\2</a>', $subject);

Upvotes: 0

Mark Eirich
Mark Eirich

Reputation: 10114

I'm surprised no one remembers PHP's parse_url function:

$url = 'http://www.youtube.com/watch?v=spsnQWtsUFM';
echo parse_url($url, PHP_URL_HOST); // displays "www.youtube.com"

I think you know what to do from there.

Upvotes: 0

Lord Loh.
Lord Loh.

Reputation: 2477

I am not a regex whizz either,

^(.*)//(.*)/(.*)$
<a href="\1//\2/\3">\2</a>

was what worked for me when I tried to use as find and replace in programmer's notepad.

^(.)// should extract the protocol - referred as \1 in the second line. (.)/ should extract everything till the first / - referred as \2 in the second line. (.*)$ captures everything till the end of the string. - referred as \3 in the second line.


Added later

^(.*)( )(.*)//(.*)/(.*)( )(.*)$
\1\2<a href="\3//\4/\5">\4</a> \7

This should be a bit better, but will only replace just 1 URL

Upvotes: 2

Battle_707
Battle_707

Reputation: 708

This is the simples and cleanest way:

$str = 'http://www.youtube.com/watch?v=spsnQWtsUFM';
preg_match("#//(.+?)/#", $str, $matches);

$site_url = $matches[1];

EDIT: I assume that the $str had been checked to be a URL in the first place, so I left that out. Also, I assume that all the URLs will contain either 'http://' or 'https://'. In case the url is formatted like this www.youtube.com/watch?v=spsnQWtsUFM or even youtube.com/watch?v=spsnQWtsUFM, the above regexp won't work!

EDIT2: I'm sorry, I didn't realize that you were trying to replace all strings in a whole test. In that case, this should work the way you want it:

$str = preg_replace('#(\A|[^=\]\'"a-zA-Z0-9])(http[s]?://(.+?)/[^()<>\s]+)#i', '\\1<a href="\\2">\\3</a>', $str);

Upvotes: 5

Tudor Constantin
Tudor Constantin

Reputation: 26871

I think this should do it (I haven't tested it):

preg_match('/^http[s]?:\/\/(.+?)\/.*/i', $main_url, $matches);
$final_url = '<a href="'.$main_url.'">'.$matches[1].'</a>';

Upvotes: 0

JMTyler
JMTyler

Reputation: 1604

The \0 is replaced by the entire matched string, whereas \x (where x is a number other than 0 starting at 1) will be replaced by each subpart of your matched string based on what you wrap in parentheses and the order those groups appear. Your solution is as follows:

ereg_replace("[[:alpha:]]+://([^<>[:space:]]+[:alnum:]*)[[:alnum:]/]", "<a href=\"\\0\" class=\"bwl\" target=\"_new\">\\1</a>

I haven't been able to test this though so let me know if it works.

Upvotes: 0

Related Questions