Aaron Hayman
Aaron Hayman

Reputation: 8502

Regex: Sub expressions?

I need to create a regex that will match this expression:

replace:sub\:str:new\:Substr

I have to be careful about not matching other similar looking strings though. For example, this a different match:

slice:fromIndex[:toIndex]

Specifically:

  1. The string must begin with replace:. If it does not, then nothing should match.
  2. It must match escaped colons: \: but not unescaped colons: :
  3. There must be two matches (the sub string and new substring). For example, in the example string the regex would match: sub\:str and new\:Substr.
  4. The point is to extract out the substring and it's replacement for use later. The string will always be in the format replace:<subString>:<replacementString>. However, both the subString and the replacementString can have escaped colons :, which is why the example includes them.

I've been unable to come up with a solution. While I'm not an expert at Regex, I'm normally pretty competent. But so far I've only been able to either ignore replace: and simply match on (?<=\:)(?:\\:|[^:])+ to include both substrings, but I end up matching other patterns as well. If I change the look behind to (?<=replace:) I only match the first substring. I just can't figure out how to get it to also match that second substring without including the : separator. I suspect I need to nest the expression somehow but I've been completely unsuccessful at it.

Note: I can solve this in the language. I can simply check if the string has the prefix replace: as a separate check. But I'd really like to do the match completely in Regex if it's possible.

Update (some examples)

This should give you an example. As background, after this string is parsed, it would be applied as a kind of filter for another template string.

Upvotes: 0

Views: 3090

Answers (3)

Toto
Toto

Reputation: 91428

How about:

^replace:(\w+\\:\w+):(\w+\\:\w+)

The first group will contain sub\:str and the second new\:Substr

New version according to OP's edit:

^replace:([^:]+(?:\\:)?[^:]+):([^:]+(?:\\:)?[^:]+)

It works for all given test cases

If you don't want replace in the whole match, put it in lookbehind:

(?<=^replace:)([^:]+(?:\\:)?[^:]+):([^:]+(?:\\:)?[^:]+)

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626932

The regex that will match all escape sequences you may have in a C string literal will look like

replace:([^:\\]*(?:\\.[^:\\]*)*):([^:\\]*(?:\\.[^:\\]*)*)

See the regex demo

NOTE: If it must appear at the start of the string, add ^ at the pattern staet.

Details:

  • replace: - a literal char sequence
  • ([^:\\]*(?:\\.[^:\\]*)*) - Capturing group 1 matching
    • [^:\\]* - 0+ chars other than : and \
    • (?:\\.[^:\\]*)* - zero or more sequences of:
      • \\. - any escaped char (a \ and any char)
      • [^:\\]* - 0+ chars other than : and \
  • : - an unescaped :
  • ([^:\\]*(?:\\.[^:\\]*)*) - see above.

Upvotes: 1

Tensibai
Tensibai

Reputation: 15784

Quite convoluted, but you can nest lookarounds:

replace:(.+?(?!(?<=\\):)):(.+(?!(?<=\\):))

Demo

It will ensure that after replace: any character is not followed by a : not itself preceded by a \

Drawback:
In case of 3 parts (a third not escaped :), The second part will include everything, see the demo for what I mean.

Upvotes: 0

Related Questions