kwichz
kwichz

Reputation: 2453

an intelligent regex to convert links (from bbcode and not) in html

I'm trying to create my own function for convert some links string from bbcode (and from normal string) to the right html.

At the moment I have this one :

$format_search=array(
    '#\[url=(((http|https|ftp)://)[a-zA-Z0-9\-_\./\?=&;\#]+)\](.*?)\[/url\]#i'
    '#(?<![>/"])(((http|https|ftp)://)[a-zA-Z0-9\-_\./\?=&;\#]+)#im',  
);

$format_replace=array(
  '<a class="lforum" href="$1">$4</a>',
  '<a class="lforum" href="$1">$1</a>',    
);

$str=preg_replace($format_search, $format_replace, $str);

it works more or less :)

Catching bbcode as [url=link]link_name[/url] is not a problem. The problem is when I try to catch every kind of link on the website. (for example when a user insert http://link.com without any bbcode).

For example [b]http://links[/b] doesnt work... and there are many others scenario to get in consideration!

I don't know how to create a good function without conflicts! I mean : first parse the string searching the link's strings as bbcode; than, parse the rest, maybe without replace the previous.

Do you have any suggestions?

Upvotes: 1

Views: 473

Answers (1)

mario
mario

Reputation: 145482

Not with that approach. The (?<![>/"]) is what prevents it from working. The purpose of that assertion is to prevent double-linkifying <a>http://example.com</a>. But it also prevents matches on <b>http://example.com.

One workaround would be to alter your output links:

 '<a href="$1" class="lforum">$4</a>',

This would allow to use the class= in the negative assertion:

 (?<![/"]|class="lforum">)http..

So it still matches tags other than <a> links.


Another approach would be to pre-convert raw text URLs into BBcode before you convert your BBcode into HTML. Use your existing URL regex for that and prefix it with e.g. (?<![\]=]) and use [url=$1]$1[/url] as output instead.

Upvotes: 2

Related Questions