Yobogs
Yobogs

Reputation: 452

Regex to replace url in html page

i have an html page

<tr>
<td rowspan="7">
<a href="http://www.link1.com/" style="text-decoration: none;">
        <img src="image1.jpg" width="34" height="873" alt="" style="display:block;border:none" />
        </a>
    </td>
    <td colspan="2" rowspan="2">
        <a href='http://www.link1.com/test.php?c=1'>
        <img src="image1.jpg" width="287" height="146" alt="" style="display:block;border:none" />
        </a>
    </td>
<td colspan="2" rowspan="2">
        <a href='http://www.url.com/test.php?c=1'>
        <img src="image1.jpg" width="287" height="146" alt="" style="display:block;border:none" />
        </a>
    </td>

I want to replace all url in href by mytest.com?url=$link

I try with :

    $messaget = preg_replace('/<a(.*)href="([^"]*)"(.*)>/','mytest.com?url=$2',$messaget);

Upvotes: 1

Views: 1691

Answers (4)

janos
janos

Reputation: 124646

This may help you in the short run:

preg_replace('/<a (.*)href=[\'"]([^"]*)[\'"](.*)>/', '<a $1href="mytest.com?url=$2"$3>', $messaget);

In your regex you were using href="...", that is, double quotes, but in your HTML you have a mixture of both double and single quotes.

And in the replacement string you forgot to include $1 and $3.

That said, DO NOT use regex to parse HTML. The answer by @BenLanc below is better, use that instead. Read the link he posted.

Upvotes: 1

BenLanc
BenLanc

Reputation: 2434

Don't use regex on HTML, HTML is not regular.

Assuming your markup is valid (and if it's not, pass it through Tidy first), you should use xpath, to grab the elements and then update the href directly. For example:

<?php
$messaget = <<<XML
<tr>
  <td rowspan="7">
    <a href="http://www.link1.com/" style="text-decoration: none;">
      <img src="image1.jpg" width="34" height="873" alt="" style="display:block;border:none" />
    </a>
  </td>
  <td colspan="2" rowspan="2">
      <a href='http://www.link1.com/test.php?c=1'>
      <img src="image1.jpg" width="287" height="146" alt="" style="display:block;border:none" />
      </a>
  </td>
  <td colspan="2" rowspan="2">
      <a href='http://www.url.com/test.php?c=1'>
      <img src="image1.jpg" width="287" height="146" alt="" style="display:block;border:none" />
      </a>
  </td>
</tr>
XML;

$xml   = new SimpleXMLElement($messaget);

// Select all "a" tags with href attributes
$links = $xml->xpath("//a[@href]");

// Loop through the links and update the href, don't forget to url encode the original!
foreach($links as $link)
{
  $link["href"] = sprintf("mytest.com/?url=%s", urlencode($link['href']));
}

// Return your HTML with transformed hrefs!
$messaget = $xml->asXml();

Upvotes: 1

lordkain
lordkain

Reputation: 3109

Regex to match an url:

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/  

More background info

Upvotes: 0

robinef
robinef

Reputation: 312

Don't forget /m at the end of your regexp since your are using multiline source:

PHP Doc PCRE

Upvotes: 0

Related Questions