SunTastic
SunTastic

Reputation: 141

Recursive Regex in PHP with variable names

I try to make bbcode-ish engine for me website. But the thing is, it is not clear which codes are available, because the codes are made by the users. And on top of that, the whole thing has to be recursive.

For example:

Hello my name is [name user-id="1"]
I [bold]really[/bold] like cheeseburgers

These are the easy ones and i achieved making it work.

Now the problem is, what happens, when two of those codes are behind each other:

I [bold]really[/bold] like [bold]cheeseburgers[/bold]

Or inside each other

I [bold]really like [italic]cheeseburgers[/italic][/bold]

These codes can also have attributes

I [bold strengh="600"]really like [text font-size="24px"]cheeseburgers[/text][bold]

The following one worked quite well, but lacks in the recursive part (?R)

(?P<code>\[(?P<code_open>\w+)\s?(?P<attributes>[a-zA-Z-0-1-_=" .]*?)](?:(?P<content>.*?)\[\/(?P<code_close>\w+)\])?)

I just dont know where to put the (?R) recursive tag.

Also the system has to know that in this string here

I [bold]really like [italic]cheeseburgers[/italic][/bold] and [bold]football[/bold]

are 2 "code-objects":

1. [bold]really like [italic]cheeseburgers[/italic][/bold]

and

2. [bold]football[/bold]

... and the content of the first one is

really like [italic]cheeseburgers[/italic]

which again has a code in it

[italic]cheeseburgers[/italic]

which content is

cheeseburgers

I searched the web for two days now and i cant figure it out.

I thought of something like this:

  1. Look for something like [**** attr="foo"] where the attributes are optional and store it in a capturing group
  2. Look up wether there is a closing tag somewhere (can be optional too)
  3. If a closing tag exists, everything between the two tags should be stored as a "content"-capturing group - which then has to go through the same procedure again.

I hope there are some regex specialist which are willing to help me. :(

Thank you!

EDIT

As this might be difficult to understand, here is an input and an expected output:

Input:

[heading icon="rocket"]I'm a cool heading[/heading][textrow][text]<p>Hi!</p>[/text][/textrow]

I'd like to have an array like

array[0][name] = heading
array[0][attributes][icon] = rocket
array[0][content] = I'm a cool heading
array[1][name] = textrow
array[1][content] = [text]<p>Hi!</p>[/text]
array[1][0][name] = text
array[1][0][content] = <p>Hi!</p>

Upvotes: 1

Views: 76

Answers (1)

Niet the Dark Absol
Niet the Dark Absol

Reputation: 324650

Having written multiple BBCode parsing systems, I can suggest NOT using regexes only. Instead, you should actually parse the text.

How you do this is up to you, but as a general idea you would want to use something like strpos to locate the first [ in your string, then check what comes after it to see if it looks like a BBCode tag and process it if so. Then, search for [ again starting from where you ended up.

This has certain advantages, such as being able to examine each code and skip it if it's invalid, as well as enforcing proper tag closing order ([bold][italic]Nesting![/bold][/italic] should be considered invalid) and being able to provide meaningful error messages to the user if something is wrong (invalid parameter, perhaps) because the parser knows exactly what is going on, whereas a regex would output something unexpected and potentially harmful.

It might be more work (or less, depending on your skill with regex), but it's worth it.

Upvotes: 2

Related Questions