janoliver
janoliver

Reputation: 7824

PHP: Filter specific html tags out of a given text

I googled a lot, for those kind of problems have been asked a lot in the past. But I didn't find anything to match my needs.

I have a html formatted text from a form. Just like this:

Hey, I am just some kind of <strong>formatted</strong> text!

Now, I want to strip all html tags, that I don't allow. PHP's built-in strip_tags() Method does that very well.

But I want to go a step further: I want to allow some Tags only inside or not inside of other tags. I also want to define my own XML Tags.

Another example:

I am a custom xml tag: <book><strong>Hello!</strong></book>. Ok... <strong>Hi!</strong>

Now, I want the <strong/> inside of <book/> to be stripped, but the <strong>Hi!</strong> can stay the way it is.

So, I want to define some rules of what I allow or don't allow, and want to have any filter do the rest.

Is there any easy way to do that? Regexp aren't what I'm looking for, for they can't parse html properly.

Regards, Jan Oliver

Upvotes: 1

Views: 2483

Answers (4)

janoliver
janoliver

Reputation: 7824

I wrote my own Filter class based on the DOM classes of PHP. Look here: XHTMLFilter class

Upvotes: 0

Tor Valamo
Tor Valamo

Reputation: 33789

Use a second argument to strip_tags, which is allowable tags.

$text = strip_tags($text, '<book><myxml:tag>');

I don't think there's a way to only strip certain tags if they're not inside other tags, without using regex.

Also, regex aren't not good at parsing HTML, but it's slow compared to the options. But that's not what you're doing here, anyways. You're going through the string and removing things you don't want. And for your complex requirement I think your only option is to use regex.

To be completely honest I think you should decide which tags are allowable and which aren't. Whether or not they are inside of other tags shouldn't matter at all. It's markup, not a script.

Upvotes: 1

Sarfraz
Sarfraz

Reputation: 382861

The second argument shows that you cal allow some tags:

string strip_tags ( string $str [, string $allowable_tags ] )

From php.net

Upvotes: 0

Alix Axel
Alix Axel

Reputation: 154663

Don't think there is such a thing, I think not even HTML Purifier does that.

I suggest you parse the XHTML by hand using something like Simple HTML Dom.

Upvotes: 2

Related Questions