Reputation: 2066
I have this function to parse bbcode -> html:
$this->text = preg_replace(array(
'/\[b\](.*?)\[\/b\]/ms',
'/\[i\](.*?)\[\/i\]/ms',
'/\[u\](.*?)\[\/u\]/ms',
'/\[img\](.*?)\[\/img\]/ms',
'/\[email\](.*?)\[\/email\]/ms',
'/\[url\="?(.*?)"?\](.*?)\[\/url\]/ms',
'/\[size\="?(.*?)"?\](.*?)\[\/size\]/ms',
'/\[youtube\](.*?)\[\/youtube\]/ms',
'/\[color\="?(.*?)"?\](.*?)\[\/color\]/ms',
'/\[quote](.*?)\[\/quote\]/ms',
'/\[list\=(.*?)\](.*?)\[\/list\]/ms',
'/\[list\](.*?)\[\/list\]/ms',
'/\[\*\]\s?(.*?)\n/ms'
),array(
'<strong>\1</strong>',
'<em>\1</em>',
'<u>\1</u>',
'<img src="\1" alt="\1" />',
'<a href="mailto:\1">\1</a>',
'<a href="\1">\2</a>',
'<span style="font-size:\1%">\2</span>',
'<object width="450" height="350"><param name="movie" value="\1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="\1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="450" height="350"></embed></object>',
'<span style="color:\1">\2</span>',
'<blockquote>\1</blockquote>',
'<ol start="\1">\2</ol>',
'<ul>\1</ul>',
'<li>\1</li>'
),
$original
);
Problem is, how to unparse this, like html -> bbcode?
Upvotes: 2
Views: 923
Reputation: 70490
It's pretty safe to say it's nigh impossible to build a reliable way to convert html to bbcode with just a slew of regexes. Use a parser (DOMDocument for instance), remove invalid elements & attributes with xpath's & inspection and then recursively walk it creating a bbcode string on the way (or just ignore invalid tags / attributes on the way).
Upvotes: 5
Reputation: 101936
If you know exactly that the HTML code you want to de-bbcode was en-bbcoded using your method, than do the following:
Switch the two array you pass to preg_replace
.
In the array with the HTML code, do the following for every element: Prepend #
to the string. Append #s
. Replace \1
(and \2
aso) with (.*?)
.
For the array with the bbcodes do thefollowing with every element: Remove /
at the beginning and /ms
at end. Replace \s
with . Remove all
\
. Remove all ?
. Replace the first (.*)
in the string with $1
and the second with $2
.
This should do. If any problems: Ask ;)
Upvotes: 3
Reputation: 51411
Don't.
Instead, store both the original unparsed text and the processed parsed text. Yes, this doubles the storage requirement, but it also makes it blindingly easy to:
Upvotes: 7