Allen Liu
Allen Liu

Reputation: 4038

How to find text within brackets with some exceptions by regular expressions?

I have a regex /^\[(text:\s*.+?\s*)\]/mi that currently works in capturing text in brackets that begin with text:. Here is an example where it works:

[text: here is my text that is
captured within the brackets.]

Now, I would like to add an exception so that it allows certain brackets like in the case below:

[text: here is my text that is
captured within the brackets
and also include ![](/some/path)]

Basically, I need it to allow the ![](/some/path) brackets in the match.

Any help would be greatly appreciated. Thanks.

Update:

Here are some cases where the text inside the brackets should be matched:

[text: here is my text that is
captured within the brackets
and also include ![](/some/path)]

[text: here is my text that is
captured within the brackets
and also include ![](/some/path) and some more text]

[text: ![](/some/path)]

![text: cat]

Here are some cases where it should not match:

[text: here is my text that is
captured within the brackets
and also include ![invalid syntax](/some/path)]

[text: here is my text that is
captured within the brackets
and also include ![] (/some/path)]

[text: here is my text that is
captured within the brackets
and also include ! [](/some/path)]

[text: here is my text that is
captured within the brackets
and also include ! [] (/some/path)]

Upvotes: 10

Views: 405

Answers (5)

Cary Swoveland
Cary Swoveland

Reputation: 110685

You can use your regex, slightly modified and simplified.

str =<<_
[text: here is my text that is
captured within the brackets
and also includes ![](/some/path)]
and other stuff
_

r = /
    ^       # match beginning of string
    \[text: # match string
    .+?     # match one or more characters lazily
    \]      # match right bracket
   /imx      # case indifferent (i), multiline (m) and extended/free-spacing (x) modes

PLACEHOLDER = 0.chr
SUBSTITUTE_OUT = '![](/'

puts str.gsub(SUBSTITUTE_OUT, PLACEHOLDER).
  scan(r).
  map { |s| s.gsub(PLACEHOLDER, SUBSTITUTE_OUT) }

[text: here is my text that is
captured within the brackets
and also includes ![](/some/path)]

Note that, in the regex, \s*.+?\s* is the same as .+? and (as @sawa noted) you could replace .+? with [^\]]+ in which case you would not need multiline mode.

Edit: I updated SUBSTITUTE_OUT in light of the OP's edit of the question. This illustrates one advantage of this approach: the regex is not affected by changes to the inner matching text.

Upvotes: 4

Michelle Welcks
Michelle Welcks

Reputation: 3904

I've used a negative lookbehind in this regex to assert that a closing bracket doesn't immediately follow an opening bracket:

^\[(text:.+?)(?<!\[)\]

Here's the walk-through.

^           # Start of line anchor.
\[          # Match opening bracket '['
(           # Start capturing group 1. 
text:       # Match 'text:'
.+?         # Match any character one or more times lazily.
)           # End capturing group 1. 
(?<!        # Begin negative lookbehind.
\[          # '[' must not preceed the next match.
)           # End negative lookbehind.
\]          # Match closing bracket.

Here's a demo.

Upvotes: 3

sawa
sawa

Reputation: 168101

I don't understand how the new line character is relevant to what you describe, so I removed ^.

/\[(text:(?:[^\[\]]|!\[\][/\w]+)+)\]/i

Upvotes: 3

Tim Pietzcker
Tim Pietzcker

Reputation: 336158

OK, so you want to allow either

  • a character that's not a bracket or
  • the sequence ![]

between the starting and ending bracket. This gives you the regex

/^\[(text:[^\[\]]*(?:!\[\][^\[\]]*)*)\]/mi

Explanation:

^           # Start of line
\[          # Match [
(           # Start of capturing group
 text:      # Match text:
 [^\[\]]*   # Match any number of characters except [ or ]
 (?:        # Optional non-capturing group:
  !\[\]     #  Match ![]
  [^\[\]]*  #  Match any number of characters except [ or ]
 )*         # Repeat as needed (0 times is OK)
)           # End of capturing group
\]          # Match ]

Test it live on regex101.com.

Upvotes: 6

Mayur Koshti
Mayur Koshti

Reputation: 1852

I think you should try the following regex:

^\[(text:.*?(?<!\[))\]

Upvotes: 0

Related Questions