Pez Cuckow
Pez Cuckow

Reputation: 14422

Strip entire html link (including text) with PHP

I have a collection of text that I am trying to process with PHP dynamically (the data comes from an XML file), however I want to strip the a link and the text that is linked.

PHP's strip_tags takes out the <a etc...> and </a> but not the text in between.

I am currently trying to use the Regex preg_replace('#(<a.*?>).*?(</a>)#', '', $content);

Another thing to note is the links have styles, classes, href and titles.

Does anyone know the solution?

Upvotes: 2

Views: 978

Answers (5)

Pez Cuckow
Pez Cuckow

Reputation: 14422

I used the solution(s) posted as comments, they seemed to work best and were exactly what I was looking for!

"For reference, you've grouped the anchor tags but not the content, which is where the problem lies. preg_replace replaces the grouped element (those included in parenthesis). You can try the following though: #(<a[^>]*?>.*?</a>)#i (i flag for a case insensitive compare)" – Brad Christie

"briefly tested shorter regex version, just for fun :) preg_replace ('/<(?:a|\/)[^>]*>/', '', $data);" – Cyber-Guard Design yesterday

Upvotes: 0

profitphp
profitphp

Reputation: 8354

try this:

$content=preg_replace('/<a[^>]*>(.*)<\/a>/iU','',$content);

Upvotes: 3

karim79
karim79

Reputation: 342635

You can use DOMDocument, for example (untested!):

$doc = new DOMDocument();
$doc->loadHTMLFile('foo.php');
$domNodeList = $doc->getElementsByTagname('a'); 
$len = count($domNodeList);
for($i = 0; $i < $len; $i++) {
    $domNodeList[$i]->parentNode->removeChild($domNodeList[$i]);
}
$doc->saveHTMLFile('output.html');

Or using Simple HTML DOM Parser:

$html = file_get_html('http://www.example.com/');
foreach($html->find('a') as $element) { 
   $element->outertext = '';
}
$html->save('output.html');

Upvotes: 2

John Giotta
John Giotta

Reputation: 16944

With regex, but not thoroughly tested

echo preg_replace('#(<a.*?>)(.*?)(<\/a>)#','$2', $str);

Also, the limit argument set to -1 will set it to no limit.

Upvotes: -1

KingCrunch
KingCrunch

Reputation: 131901

Because the a-Element is not the online one, that can break your page, you better should use a whitelist approach, like strip_tags().

Upvotes: 0

Related Questions