Aamir Mahmood
Aamir Mahmood

Reputation: 2724

PCRE Regular Expression : String not containing

We have developed some flash application with WYSIWYG editor on backend. We have to present more functionality in editor so we decided to put custom tags < start more > ... < /end more > in our WYSIWYG.

All HTML is parsed and converted to XML, but only problem is we need to get the start more /end more tags to convert them to custom fade effects to show more content on a post inside flash.

Long story short, here is sample XML output.

Some text outside <start more> some text inside</end more>
some other text <start more>1 and some random stuff <start more>2 and 
thing </end more>2 and random stuff </end more>

Regular expression to get start more and end more

/(<start more>){1,1}(.+?)(<end more>)/

this expression capture first < start more > and first < end more > in the string. i tried to do negative lookahead assertion to only get inner most tags. but not working.

hope it makes sense. Let me know if I couldn't explain the problem.

Upvotes: 1

Views: 1428

Answers (2)

OrangeDog
OrangeDog

Reputation: 38826

It is not possible to correctly parse xml/html with regular expressions. You will have to write a proper parser.

Upvotes: 2

Kobi
Kobi

Reputation: 138137

You should work that into your parser, which you said you already have.
If you change <start more></end more> to a valid pair, say <more> </more>, any HTML parser should already handle it correctly, even if it isn't a known tag.

If you insist, a weak regex might be:

/<start more>(((?!<(?:/end|start) more>).)+)</end more>/

Upvotes: 3

Related Questions