HOSSEIN B
HOSSEIN B

Reputation: 323

Parse BBCode in array

I am trying to call a function from BBCode(like WordPress shortcodes). but I didn't find any code to do that, only I found HTML tag parser like:

[bold]Bold text[/bold]
->
<b>Bold text</b>

But I want to save it as an array, for example:

[date format="j M, Y" type="jalali"]

to something like this:

array(
    'date' => array(
        'format' => 'j M, Y',
        'type' => 'jalali'
    )
)

*Edited

I made a code to do this (sorry if my English is bad):

[date format="Y/m/d" type="jalali"] =>

Step 1: Get code between "[" and "]":
date format="Y/m/d" type="jalali"

Step 2: Explode space in the code:
$code = array('date', 'format="Y/m/d"', 'type="jalali"')

Step 3: Get shortcode name(offset 0 of $code) and get
difference($code excluded offset 0):
$name = 'date'
$attr = array('format="Y/m/d"', 'type="jalali"')

Step 4: Now I have attributes and code name. But the problem is if
put space in attributes value it will explode that too:
[date format="j M, Y" type="jalali"] =>
$code = array('date', 'format="j', 'M,', ' Y"', 'type="jalali"');

Now how can I fix this or get name and attributes with regex or anything else?

Upvotes: -1

Views: 370

Answers (2)

mickmackusa
mickmackusa

Reputation: 47900

Using a regex with the \G (continue metacharacter), you can parse the vital segments of the tag, then loop over the matches to construct the 2d array.

Code: (Demo)

$text = <<<TEXT
Some text [date format="j M, Y" type="jalali"] containing a date tag.
TEXT;

preg_match_all(
    '/
     (?|                           #use "branch reset" to ensure all attributes are in column one of result
      \[                           #match literal left brace
      (date)                       #match the literal tag name
      (?=[^\]]*])                  #validate that substring is eventually followed by a right brace
      |                            #OR
      \G(?!^)                      #continue matching from last match, but not from start of string
      (?:                          #encapsulate next OR expression
          ]                        #match ending delimiter of bbtag
          |                        #OR
          \h([a-z\-]+)="([^"]*)"   #match space, capture attribute name, match =", capture value, match "
      )                            #close non-capturing group
     )                             #close branch reset group
     /x',
    $text,
    $m,
    PREG_SET_ORDER
);
$result = [];
foreach ($m as $i => $row) {
    if (!$i) {
        $parent = $row[1];
    } elseif (isset($row[1])) {
        $result[$parent][$row[1]] = $row[2];
    }
}
var_export($result);

If you are going to call a custom function and replace the tag placeholder with dynamic text, then you'll need to parse the whole tag with preg_replace_callback() and break it into its part inside the callback. I'll demonstrate how to make the first key-value pair mandatory and the second key-value pair optional

Code: (Demo)

function replaceTag($m) {
    $result[$m[1]][$m[2]] = $m[3];
    if (isset($m[4], $m[5])) {
        $result[$m[1]][$m[4]] = $m[5];
    }
    return json_encode($result);  // https://stackoverflow.com/q/30377948/2943403
}

$text = <<<TEXT
Some text [date format="j M, Y" type="jalali"] containing a date tag.
TEXT;

echo preg_replace_callback(
    '/\[(date) ([a-z\-]+)="([^"]*)"(?: ([a-z\-]+)="([^"]*)")?]/',
    'replaceTag',
    $text
);
// Some text {"date":{"format":"j M, Y","type":"jalali"}} containing a date tag.

Upvotes: 1

Jerson
Jerson

Reputation: 1745

You can try this using regex

$code = '[date format="j M, Y" type="jalali"]';

preg_match_all("/\[([^\]]*)\]/", $code, $matches);

$codes = [];

foreach($matches[1] as $match) {
  // Normalize quotes into double quotes
  $match = str_replace("'",'"',$match);
  // Split by space but ignore inside of double quotes
  preg_match_all('/(?:[^\s+"]+|"[^"]*")+/',$match,$tokens);
  $parsed = [];
  $prevToken = '';
  foreach($tokens[0] as $token) {
    if(strpos($token,'=') !== false) {
      if($prevToken !== '') {
        $parts = explode('=',$token);
        $parsed[$prevToken][$parts[0]] = trim($parts[1],'"\''); 
      }
    } else {
      $parsed[$token] = [];
      $prevToken = $token;
    }
  }

  $codes[] = $parsed;
}

var_dump($codes);

Result:

array(1) {
  [0]=>
  array(1) {
    ["date"]=>
    array(2) {
      ["format"]=>
      string(6) "j M, Y"
      ["type"]=>
      string(6) "jalali"
    }
  }
}

Upvotes: 2

Related Questions