Reputation: 2724
We have developed some flash application with WYSIWYG editor on backend. We have to present more functionality in editor so we decided to put custom tags < start more > ... < /end more > in our WYSIWYG.
All HTML is parsed and converted to XML, but only problem is we need to get the start more /end more tags to convert them to custom fade effects to show more content on a post inside flash.
Long story short, here is sample XML output.
Some text outside <start more> some text inside</end more>
some other text <start more>1 and some random stuff <start more>2 and
thing </end more>2 and random stuff </end more>
Regular expression to get start more and end more
/(<start more>){1,1}(.+?)(<end more>)/
this expression capture first < start more > and first < end more > in the string. i tried to do negative lookahead assertion to only get inner most tags. but not working.
hope it makes sense. Let me know if I couldn't explain the problem.
Upvotes: 1
Views: 1428
Reputation: 38826
It is not possible to correctly parse xml/html with regular expressions. You will have to write a proper parser.
Upvotes: 2
Reputation: 138137
You should work that into your parser, which you said you already have.
If you change <start more></end more>
to a valid pair, say <more> </more>
, any HTML parser should already handle it correctly, even if it isn't a known tag.
If you insist, a weak regex might be:
/<start more>(((?!<(?:/end|start) more>).)+)</end more>/
Upvotes: 3