RebeccaBlack
RebeccaBlack

Reputation: 45

replace html tags with bb code

How can I replace certain HTML tags with BBcode like tags?

For example replace <a ...> ... </a> with [url ...] ... [/url] or <code ...> ... </code> with [code ...] ... [/code] from a $var string

Upvotes: 0

Views: 1866

Answers (4)

noun
noun

Reputation: 3715

To convert old articles that were using HTML tags inside, I have created this, pretty complicated, script. The $body variable contains the article text. This procedure is able to replace pre and code tags with a special marker. When all the other tags are converted, the script will replace the previous marker with text. This procedure works with both html or bbcode text.

  // Let's find all code inside the body. The code can be inside <pre></pre>, <code></code>, or [code][/code] if you
  // are using BBCode markup language.
  $pattern = '%(?P<openpre><pre>)(?P<contentpre>[\W\D\w\s]*?)(?P<closepre></pre>)|(?P<opencode><code>)(?P<contentcode>[\W\D\w\s]*?)(?P<closecode></code>)|(?P<openbbcode>\[code=?\w*\])(?P<contentbbcode>[\W\D\w\s]*?)(?P<closebbcode>\[/code\])%i';

  if (preg_match_all($pattern, $body, $snippets)) {

    $pattern = '%<pre>[\W\D\w\s]*?</pre>|<code>[\W\D\w\s]*?</code>|\[code=?\w*\][\W\D\w\s]*?\[/code\]%i';

    // Replaces the code snippet with a special marker to be able to inject the code in place.
    $body = preg_replace($pattern, '___SNIPPET___', $body);
  }


  // Replace links.
  $body = preg_replace_callback('%(?i)<a[^>]+>(.+?)</a>%',

    function ($matches) use ($item) {

      // Extracts the url.
      if (preg_match('/\s*(?i)href\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1) {
        $href = strtolower(trim($others[1], '"'));

        // Extracts the target.
        if (preg_match('/\s*(?i)target\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1)
          $target = strtolower(trim($others[1], '"'));
        else
          $target = "_self";
      }
      else
        throw new \RuntimeException(sprintf("Article with idItem = %d have malformed links", $item->idItem));

      return "[url=".$href." t=".$target."]".$matches[1]."[/url]";

    },

    $body
  );


  // Replace images.
  $body = preg_replace_callback('/<img[^>]+>/i',

    function ($matches) use ($item) {

      // Extracts the src.
      if (preg_match('/\s*(?i)src\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1)
        $src = strtolower(trim($others[1], '"'));
      else
        throw new \RuntimeException(sprintf("Article with idItem = %d have malformed images", $item->idItem));

      return "[img]".$src."[/img]";

    },

    $body
  );


  // Replace other tags.
  $body = preg_replace_callback('%</?[a-z][a-z0-9]*[^<>]*>%i',

    function ($matches) {
      $tag = strtolower($matches[0]);

      switch ($tag) {
        case ($tag == '<strong>' || $tag == '<b>'):
          return '[b]';
          break;

        case ($tag == '</strong>' || $tag == '</b>'):
          return '[/b]';
          break;

        case ($tag == '<em>' || $tag == '<i>'):
          return '[i]';
          break;

        case ($tag == '</em>' || $tag == '</i>'):
          return '[/i]';
          break;

        case '<u>':
          return '[u]';
          break;

        case '</u>':
          return '[/u]';
          break;

        case ($tag == '<strike>' || $tag == '<del>'):
          return '[s]';
          break;

        case ($tag == '</strike>' || $tag == '</del>'):
          return '[/s]';
          break;

        case '<ul>':
          return '[list]';
          break;

        case '</ul>':
          return '[/list]';
          break;

        case '<ol>':
          return '[list=1]';
          break;

        case '</ol>':
          return '[/list]';
          break;

        case '<li>':
          return '[*]';
          break;

        case '</li>':
          return '';
          break;

        case '<center>':
          return '[center]';
          break;

        case '</center>':
          return '[/center]';
          break;

        default:
          return $tag;
      }
    },

    $body
  );


  // Now we strip the remaining HTML tags.
  $body = strip_tags($body);


  // Finally we can restore the snippets, converting the HTML tags to BBCode tags.
  $snippetsCount = count($snippets[0]);

  for ($i = 0; $i < $snippetsCount; $i++) {
    // We try to determine which tags the code is inside: <pre></pre>, <code></code>, [code][/code]
    if (!empty($snippets['openpre'][$i]))
      $snippet = "[code]".PHP_EOL.trim($snippets['contentpre'][$i]).PHP_EOL."[/code]";
    elseif (!empty($snippets['opencode'][$i]))
      $snippet = "[code]".PHP_EOL.trim($snippets['contentcode'][$i]).PHP_EOL."[/code]";
    else
      $snippet = $snippets['openbbcode'][$i].PHP_EOL.trim($snippets['contentbbcode'][$i]).PHP_EOL.$snippets['closebbcode'][$i];

    $body = preg_replace('/___SNIPPET___/', PHP_EOL.trim($snippet).PHP_EOL, $body, 1);
  }

  //echo $body;

Upvotes: 1

mario
mario

Reputation: 145482

Reverse HTML to BBCODE conversions are not difficult. Libraries exist for that, and I'm certain we have a duplicate answer. But I'm bad at searching too.

Basically you can use preg_replace like this:

 // for 1:1 translations
 $text = preg_replace('#<(/?)(b|i|code|pre)>#', '[$1$2]', $text);

 // complex tags
 $text = preg_replace('#<a href="([^"]+)">([^<]+)</a>#',
             "[url=$1]$2[/url]", $text);

But the second case will fail if your input HTML doesn't very exactly match the expectations. If you try to convert exported Word files, such a simplistic approach will fail. Also you need more special cases for [img] and stuff.

Upvotes: 1

ridgerunner
ridgerunner

Reputation: 34395

Not a trivial task. I looked into this a while back and the best code I came across was this one: cbparser

Upvotes: 0

Rasika
Rasika

Reputation: 1998

You could write a customized XSLT to convert the formatting and run it through and XSLT processor to get the desired output.

Upvotes: 2

Related Questions