Reputation: 7824
I googled a lot, for those kind of problems have been asked a lot in the past. But I didn't find anything to match my needs.
I have a html formatted text from a form. Just like this:
Hey, I am just some kind of <strong>formatted</strong> text!
Now, I want to strip all html tags, that I don't allow. PHP's built-in strip_tags() Method does that very well.
But I want to go a step further: I want to allow some Tags only inside or not inside of other tags. I also want to define my own XML Tags.
Another example:
I am a custom xml tag: <book><strong>Hello!</strong></book>. Ok... <strong>Hi!</strong>
Now, I want the <strong/>
inside of <book/>
to be stripped, but the <strong>Hi!</strong>
can stay the way it is.
So, I want to define some rules of what I allow or don't allow, and want to have any filter do the rest.
Is there any easy way to do that? Regexp aren't what I'm looking for, for they can't parse html properly.
Regards, Jan Oliver
Upvotes: 1
Views: 2483
Reputation: 7824
I wrote my own Filter class based on the DOM classes of PHP. Look here: XHTMLFilter class
Upvotes: 0
Reputation: 33789
Use a second argument to strip_tags, which is allowable tags.
$text = strip_tags($text, '<book><myxml:tag>');
I don't think there's a way to only strip certain tags if they're not inside other tags, without using regex.
Also, regex aren't not good at parsing HTML, but it's slow compared to the options. But that's not what you're doing here, anyways. You're going through the string and removing things you don't want. And for your complex requirement I think your only option is to use regex.
To be completely honest I think you should decide which tags are allowable and which aren't. Whether or not they are inside of other tags shouldn't matter at all. It's markup, not a script.
Upvotes: 1
Reputation: 382861
The second argument shows that you cal allow some tags:
string strip_tags ( string $str [, string $allowable_tags ] )
From php.net
Upvotes: 0
Reputation: 154663
Don't think there is such a thing, I think not even HTML Purifier does that.
I suggest you parse the XHTML by hand using something like Simple HTML Dom.
Upvotes: 2