Andrew Madden
Andrew Madden

Reputation: 73

Regex to exclude previously captured characters

How does the exclude operator ^ behave with previously captured values?

Notice the [^\1] in the regex below:

"abcdefgabcdefga".scan(/(\w)([^\1]+)(\1)/)
[
[0] [
    [0] "a",
    [1] "bcdefgabcdefg",
    [2] "a"
]
]

Notice the [^a] in the regex below:

"abcdefgabcdefga".scan(/(\w)([^a]+)(\1)/)

[
  [0] [
      [0] "a",
      [1] "bcdefg",
      [2] "a"
  ]
]

Seems the \1 and hardcoded 'a' are two different things?

I'm using ruby 2.1.1p76

Upvotes: 2

Views: 51

Answers (2)

Coenwulf
Coenwulf

Reputation: 1937

As has already been said in other answers \1 is not a back reference when inside square brackets. The proposed solutions are viable, but just to throw out one more alternative that uses non-greedy matching:

/(\w)(.+?)(\1)/

That will stop as soon as it finds the back reference.

Upvotes: 0

anubhava
anubhava

Reputation: 784958

No \1 does represent back reference to a indeed but [^\1] is the problem here which is not same as [^a] since special meaning of \1 gets lost inside character class.

Correct way of doing this is using negative lookahead like this:

(\w)(?:(?!\1).)+(\1)

Ruby RegEx Demo

Upvotes: 1

Related Questions