Yelneerg
Yelneerg

Reputation: 171

Using regex in preg_replace to match an html href anchor tag

I'm trying to use preg_replace to replace

<a href="WWW.ANYURL.COM">DISPLAY_TEXT</a>

with

<a href="WWW.ANYURL.COM">DISPLAY_TEXT</a>

here is my code:

$string = htmlentities(mysql_real_escape_string($string1)); 
$newString = preg_replace('#&lt;a\ href=&quot;([^&]*)&quot;&gt;([^&]*)&lt;/a&gt;#','<a href="$1">$2</a>',$string);

If I do limited tests such as:

$newString = preg_replace('#&lt;a\ href#','TEST',$string);

then

&lt;a href=&quot;WWW.ANYURL.COM&quot;&gt;DISPLAYTEXT&lt;/a&gt;

becomes

TEST=&quot;WWW.ANYURL.COM&quot;&gt;DISPLAYTEXT&lt;/a&gt;

But if I try to get it to also match the "=" it acts as if it could't find a match, i.e.

$newString = preg_replace('#&lt;a\ href=#','TEST',$string);

returns the original unchanged:

&lt;a href=&quot;WWW.ANYURL.COM&quot;&gt;DISPLAY_TEXT&lt;/a&gt;

I've been going at this for a couple hours, any help would be greatly appreciated.

EDIT: code in context

$title = clean_input($_POST['title']);
$story = clean_input($_POST['story']);

function clean_input($string) 
  { 
  if(get_magic_quotes_gpc())
  {
   $string = stripslashes($string);
  }
$string = htmlentities(mysql_real_escape_string($string)); 
$findValues = array("&lt;b&gt;","&lt;/b&gt;");
$newValues = array("<b>", "</b>");
$newString = str_replace($findValues, $newValues, $string);
$newString2 = preg_replace('#&lt;a\ href=&quot;([^&]*)&quot;&gt;([^&]*)&lt;/a&gt;#','<a href="$1">$2</a>',$newString);
return $newString2;
}

Sample $story = Lorem ipsum dolor sit amet, consectetur adipiscing elit. <a href="www.google.com">Google</a> Vivamus quis sem felis. Morbi vitae neque ac neque blandit malesuada lobortis sit amet justo. Donec convallis, nibh ut lacinia tempor, neque felis scelerisque nibh, at feugiat lectus erat in nulla. In et euismod nunc. <pernicious code></code>Pellentesque vitae ante orci, vitae ultrices neque. <a href="www.yahoo.com">Yahoo</a> In non nulla sapien, vestibulum faucibus metus. Fusce egestas viverra arcu, <b>ac</b> sagittis leo facilisis in. Nulla facilisi.

I want only a few tags like href and bold to be allowed through as code.

Upvotes: 2

Views: 1933

Answers (2)

mario
mario

Reputation: 145482

You don't need to manually replace anything. If this is your whole input string, then use html_entity_decode() to turn the escapes back into < and >.


Again, your regex works as intended with the sample text.

Your problem is the premature mysql_real_escape_string() call. It adds backslashes to the " double quotes in your html, and that's why back-converting fails (your regex is not prepared for finding \&quot;).

Avoid that. Get rid of the ugly clean_string() hack and magic_quotes as advised by the manual. You must do the database escaping right before inserting into the database, not earlier. (Or better yet use the easier PDO with prepared statements.)

Also avoid the $newString123 variable duplicates, just overwrite the one you already have when rewriting strings.

Upvotes: 5

Morten Kristensen
Morten Kristensen

Reputation: 7613

You could also do it like this:

$str = "&lt;a href=&quot;WWW.ANYURL.COM&quot;&gt;DISPLAY_TEXT&lt;/a&gt;";
echo "Your html code is thus: " . htmlspecialchars_decode($str);

Upvotes: 1

Related Questions