Reputation: 45
How can I replace certain HTML tags with BBcode like tags?
For example replace <a ...> ... </a>
with [url ...] ... [/url]
or <code ...> ... </code>
with [code ...] ... [/code]
from a $var string
Upvotes: 0
Views: 1866
Reputation: 3715
To convert old articles that were using HTML tags inside, I have created this, pretty complicated, script. The $body variable contains the article text. This procedure is able to replace pre and code tags with a special marker. When all the other tags are converted, the script will replace the previous marker with text. This procedure works with both html or bbcode text.
// Let's find all code inside the body. The code can be inside <pre></pre>, <code></code>, or [code][/code] if you
// are using BBCode markup language.
$pattern = '%(?P<openpre><pre>)(?P<contentpre>[\W\D\w\s]*?)(?P<closepre></pre>)|(?P<opencode><code>)(?P<contentcode>[\W\D\w\s]*?)(?P<closecode></code>)|(?P<openbbcode>\[code=?\w*\])(?P<contentbbcode>[\W\D\w\s]*?)(?P<closebbcode>\[/code\])%i';
if (preg_match_all($pattern, $body, $snippets)) {
$pattern = '%<pre>[\W\D\w\s]*?</pre>|<code>[\W\D\w\s]*?</code>|\[code=?\w*\][\W\D\w\s]*?\[/code\]%i';
// Replaces the code snippet with a special marker to be able to inject the code in place.
$body = preg_replace($pattern, '___SNIPPET___', $body);
}
// Replace links.
$body = preg_replace_callback('%(?i)<a[^>]+>(.+?)</a>%',
function ($matches) use ($item) {
// Extracts the url.
if (preg_match('/\s*(?i)href\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1) {
$href = strtolower(trim($others[1], '"'));
// Extracts the target.
if (preg_match('/\s*(?i)target\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1)
$target = strtolower(trim($others[1], '"'));
else
$target = "_self";
}
else
throw new \RuntimeException(sprintf("Article with idItem = %d have malformed links", $item->idItem));
return "[url=".$href." t=".$target."]".$matches[1]."[/url]";
},
$body
);
// Replace images.
$body = preg_replace_callback('/<img[^>]+>/i',
function ($matches) use ($item) {
// Extracts the src.
if (preg_match('/\s*(?i)src\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1)
$src = strtolower(trim($others[1], '"'));
else
throw new \RuntimeException(sprintf("Article with idItem = %d have malformed images", $item->idItem));
return "[img]".$src."[/img]";
},
$body
);
// Replace other tags.
$body = preg_replace_callback('%</?[a-z][a-z0-9]*[^<>]*>%i',
function ($matches) {
$tag = strtolower($matches[0]);
switch ($tag) {
case ($tag == '<strong>' || $tag == '<b>'):
return '[b]';
break;
case ($tag == '</strong>' || $tag == '</b>'):
return '[/b]';
break;
case ($tag == '<em>' || $tag == '<i>'):
return '[i]';
break;
case ($tag == '</em>' || $tag == '</i>'):
return '[/i]';
break;
case '<u>':
return '[u]';
break;
case '</u>':
return '[/u]';
break;
case ($tag == '<strike>' || $tag == '<del>'):
return '[s]';
break;
case ($tag == '</strike>' || $tag == '</del>'):
return '[/s]';
break;
case '<ul>':
return '[list]';
break;
case '</ul>':
return '[/list]';
break;
case '<ol>':
return '[list=1]';
break;
case '</ol>':
return '[/list]';
break;
case '<li>':
return '[*]';
break;
case '</li>':
return '';
break;
case '<center>':
return '[center]';
break;
case '</center>':
return '[/center]';
break;
default:
return $tag;
}
},
$body
);
// Now we strip the remaining HTML tags.
$body = strip_tags($body);
// Finally we can restore the snippets, converting the HTML tags to BBCode tags.
$snippetsCount = count($snippets[0]);
for ($i = 0; $i < $snippetsCount; $i++) {
// We try to determine which tags the code is inside: <pre></pre>, <code></code>, [code][/code]
if (!empty($snippets['openpre'][$i]))
$snippet = "[code]".PHP_EOL.trim($snippets['contentpre'][$i]).PHP_EOL."[/code]";
elseif (!empty($snippets['opencode'][$i]))
$snippet = "[code]".PHP_EOL.trim($snippets['contentcode'][$i]).PHP_EOL."[/code]";
else
$snippet = $snippets['openbbcode'][$i].PHP_EOL.trim($snippets['contentbbcode'][$i]).PHP_EOL.$snippets['closebbcode'][$i];
$body = preg_replace('/___SNIPPET___/', PHP_EOL.trim($snippet).PHP_EOL, $body, 1);
}
//echo $body;
Upvotes: 1
Reputation: 145482
Reverse HTML to BBCODE conversions are not difficult. Libraries exist for that, and I'm certain we have a duplicate answer. But I'm bad at searching too.
Basically you can use preg_replace
like this:
// for 1:1 translations
$text = preg_replace('#<(/?)(b|i|code|pre)>#', '[$1$2]', $text);
// complex tags
$text = preg_replace('#<a href="([^"]+)">([^<]+)</a>#',
"[url=$1]$2[/url]", $text);
But the second case will fail if your input HTML doesn't very exactly match the expectations. If you try to convert exported Word files, such a simplistic approach will fail. Also you need more special cases for [img] and stuff.
Upvotes: 1
Reputation: 34395
Not a trivial task. I looked into this a while back and the best code I came across was this one: cbparser
Upvotes: 0
Reputation: 1998
You could write a customized XSLT to convert the formatting and run it through and XSLT processor to get the desired output.
Upvotes: 2