Darren
Darren

Reputation: 13128

Get string between tags when multiple tags present

Just trying to figure this one out as regex is nowhere near my strong point :( Basically I'm trying to get the value between bbcode tags: That could look like either of the following:

[center]text[/center]
[left][center]text[/center][/left]
[right][left][center]text[/center][/left][/right]

And currently have this hideous if else block of code to prevent it getting large like the third option above.

   if (/\[left\]|\[\/left\]/.test(text[2])) {

        // set the value in the [left][/left] tags
        text[2] = text[2].match(/\[left\](.*?)\[\/left\]/)[1];
    } else if (/\[right\]|\[\/right\]/.test(text[2])) {

        // set value in the [right][/right] tags
        text[2] = text[2].match(/\[right\](.*?)\[\/right\]/)[1];
    } else if (/\[center\]|\[\/center\]/.test(text[2])) {

        // set value in the [right][/right] tags
        text[2] = text[2].match(/\[center\](.*?)\[\/center\]/)[1];
    }

What I'd like to do is shorten it down to a single regex expression to grab that value text from the above examples, I've gotten down to an expression like this:

/\[(?:center|left|right)\](.*?)\[\/(?:center|left|right)\]/

But as you can see in this RegExr demo, it doesn't match what I need it to.

How can I achieve this?

Note

It should only match left|right|center as the selected text could also have various other bbcode tags.

If the string looks like this:

[center][left][img]/link/to/img.png[/img][/left][/center]

I want to get what is between the left|center|right tags which in this case would be:

[img]/link/to/img.png[/img]

More examples:

[center][url=lintosomething.com]LINK TEXT[/url][/center]

Should only get: [url=lintosomething.com]LINK TEXT[/url]

Or

[center]egibibskdfbgfdkfbg sd fgkgb fkgbgk fhwo3g regbiurb geir so go to [url=lintosomething.com]LINK TEXT[/url] and ibgri gbenkenbieurgnerougnerogrnreog erngo[/center]

Wanting:

egibibskdfbgfdkfbg sd fgkgb fkgbgk fhwo3g regbiurb geir so go to [url=lintosomething.com]LINK TEXT[/url] and ibgri gbenkenbieurgnerougnerogrnreog erngo

Upvotes: 1

Views: 623

Answers (3)

echochamber
echochamber

Reputation: 1815

Edit: Ok, I think this fits your needs.

My regex:

/[^\]\[]*\[(\w+)[=\.\"\w]*\][^\]]+\[\/\1\][^\]\[]*/g

Explanation:

  1. Match 0 or more characters that arent [ or ]
  2. Match a single [
  3. Match 1 or more of alpha characters, we'll use this later as a backreference
  4. Match 0 or more of = . " or alpha characters
  5. Match a single ]
  6. Match 1 or more non [ characters
  7. Match a single [
  8. Match a single /
  9. Match the same characters as step 3. (Our back reference)
  10. Match a single ]
  11. Match 0 or more characters that arent [ or ]

However I would like to state that if you're going to be parsing bbcodes you're almost certainly better off just using a bbparser.

Upvotes: 2

Ender2050
Ender2050

Reputation: 6992

You could use a capturing group like this:

(?:\[\w+\])*(\w+)(?:\[\/\w+\])*

Or with a capture group named "value" like this:

(?:\[\w+\])*(?<value>\w+)(?:\[\/\w+\])*

The first and last groups are non-capturing... (?: ...) And the middle group is capturing (\w+) And the middle group if named like this (?<value>\w+)

Note: For simplicity, I replaced your center|left|right values with \w+ but you could swap them back in with no impact.

I use an app called RegExRX. Here's a screenshot with the RegEx and captured values. enter image description here

Lots of ways you could tweak it. Good luck!

Upvotes: 1

Mike Brant
Mike Brant

Reputation: 71384

Why not just replace all those tags with empty string

var rawString; // your input string
var cleanedString = rawString.replace(~\[/?(left|right|center)\]~, '');

Upvotes: 2

Related Questions