Reputation: 14422
I have a collection of text that I am trying to process with PHP dynamically (the data comes from an XML file), however I want to strip the a link and the text that is linked.
PHP's strip_tags takes out the <a etc...>
and </a>
but not the text in between.
I am currently trying to use the Regex preg_replace('#(<a.*?>).*?(</a>)#', '', $content);
Another thing to note is the links have styles, classes, href and titles.
Does anyone know the solution?
Upvotes: 2
Views: 978
Reputation: 14422
I used the solution(s) posted as comments, they seemed to work best and were exactly what I was looking for!
"For reference, you've grouped the anchor tags but not the content, which is where the problem lies. preg_replace replaces the grouped element (those included in parenthesis). You can try the following though: #(<a[^>]*?>.*?</a>)#i
(i flag for a case insensitive compare)" – Brad Christie
"briefly tested shorter regex version, just for fun :) preg_replace ('/<(?:a|\/)[^>]*>/', '', $data);
" – Cyber-Guard Design yesterday
Upvotes: 0
Reputation: 8354
try this:
$content=preg_replace('/<a[^>]*>(.*)<\/a>/iU','',$content);
Upvotes: 3
Reputation: 342635
You can use DOMDocument, for example (untested!):
$doc = new DOMDocument();
$doc->loadHTMLFile('foo.php');
$domNodeList = $doc->getElementsByTagname('a');
$len = count($domNodeList);
for($i = 0; $i < $len; $i++) {
$domNodeList[$i]->parentNode->removeChild($domNodeList[$i]);
}
$doc->saveHTMLFile('output.html');
Or using Simple HTML DOM Parser:
$html = file_get_html('http://www.example.com/');
foreach($html->find('a') as $element) {
$element->outertext = '';
}
$html->save('output.html');
Upvotes: 2
Reputation: 16944
With regex, but not thoroughly tested
echo preg_replace('#(<a.*?>)(.*?)(<\/a>)#','$2', $str);
Also, the limit
argument set to -1 will set it to no limit.
Upvotes: -1
Reputation: 131901
Because the a-Element is not the online one, that can break your page, you better should use a whitelist approach, like strip_tags().
Upvotes: 0