koressak
koressak

Reputation: 191

How to remove html part of a text in PHP

I have a question about parsing text and removing unwanted html parts. I know functions like - strip_tags() which will remove all the tags, but the problem is, that this function leaves the "inside text" there.

Let me show you an example, we have a text:

Hello, how are you? <a href="">Link to my website</a> __Here continues html tags, links, images__

What I want is to remove the whole part, where html resides. Not only tags, but also text (like "Link to my website" above).

Is there any efficient way, function that I missed?

Upvotes: 2

Views: 3765

Answers (7)

Yoshi
Yoshi

Reputation: 54649

Try this:

function removeTags($str) {
    $result = '';

    $xpath = new DOMXPath(DOMDocument::loadHTML(sprintf('<body>%s</body>', $str)));
    foreach ($xpath->query('//body/text()') as $textNode) {
        $result .= $textNode->nodeValue;
    }

    return $result;
}

echo removeTags(
    'Hello, how are you? <a href="">Link to my website</a> __Here continues html <span>tags</span>, links, images__'
);

Output:

Hello, how are you? __Here continues html , links, images__

Upvotes: 3

krystian
krystian

Reputation: 1

Maybe this will work:

http://htmlpurifier.org/

Here is tutorial

http://www.zendcasts.com/writing-custom-zend-filters-with-htmlpurifier/2011/06/

it's for Zend Framework but I think it may helps

Upvotes: 0

winya
winya

Reputation: 151

Some preg magic?

$text = preg_replace('/<[\/\!]*?[^<>]*?>/si', '', $text);

Upvotes: 0

bahaa
bahaa

Reputation: 1

i have searched and found this solution

$txt = "
<html>
<head><title>Something wicked this way comes</title></head>
<body>
This is the interesting stuff I want to extract
</body>
</html>";

$text = preg_replace("/<([^<>]*)>/", "", $txt);

echo htmlentities($text);

Upvotes: 0

k102
k102

Reputation: 8079

maybe its not correct, but...

$str = 'Hello, how are you? <a href="">Link to my website</a> __Here continues html tags, links, ';
$rez = preg_replace("/\<.*\>/i",'',$str);
var_dump($rez);

gave me an output

string 'Hello, how are you?  __Here continues html tags, links, ' (length=56)

Upvotes: 1

Phliplip
Phliplip

Reputation: 3632

Why not make it a rule that the submittet input are not allowed to contain tags.

function containsIllegalHtml($input, $allowable_tags = '') {
    if($input != strip_tags($input, $allowable_tags)) {
        return true;
    } else {
        return false;
    }
}

Use this function to check wether the input contains tags or not.

Upvotes: 1

bahaa
bahaa

Reputation: 1

you may write a function that takes a string and it uses php string capabilities to get the position of the "<" and then the position of the ">" and strip them from the input string

Upvotes: 0

Related Questions