BJagger
BJagger

Reputation: 171

Patterns in two catching groups work as intended but fail when put in backreference

I write regex which should match a pattern consisting of only optional characters, but simultaneously I want it to doesn't match an empty string - I figured that I will use backreference for this, but when I try to apply it my checks stop working: probably I lack in understanding of backreference but please give me a hint where I am making a mistake, my regex:

(?<pat>^[10]+$)(\k<pat>0*(1(10)*(0|(11))((01*0)(01)*1|(00))*1)*0*)

Both subpatterns applied - no matches - regardless if correct or wrong

enter image description here

Only the second pattern applied (with additional checks for anchors - all correct matches work but also matches empty line

enter image description here

Only first pattern applied: matches any combination of 1 and 0

enter image description here

Also, I have a secondary question: If I am using backreference can I use anchors ^ and $ to check if the whole pattern matched by the first group is matched in backreference? As requested in the comments here are inputs

Those should match:
101
000
1010
10100
Those should NOT match (empty string in first row):

101110101
1110001
1000
11

and patterns:

^[10]+$
^0*(1(10)*(0|(11))((01*0)(01)*1|(00))*1)*0*$
(?<pat>^[10]+$)(\k<pat>0*(1(10)*(0|(11))((01*0)(01)*1|(00))*1)*0*)

Upvotes: 1

Views: 53

Answers (1)

The fourth bird
The fourth bird

Reputation: 163577

In your pattern you have this (?<pat>^[10]+$) followed by \k<pat> which matches the characters that were captured by the named group pat.

But as you only match 0 or 1 and you have an anchor $ that asserts the end of the string, there is no more text to match after it by the backreference.

As you don't want to match empty strings, you can write the pattern as:

^(?<pat>0*(1(10)*(0|11)(01*0(01)*1|00)*1)+0*|0+)$

Explanation

  • ^ Start of string
  • (?<pat> Named group pat
    • 0* Match optional zeroes
    • ( Start group to repeat as a whole part
      • 1(10)*(0|11)(01*0(01)*1|00)*1 The initial pattern
    • )+ Close group and repeat this part 1 or more times to prevent matching an empty string
    • 0* Match optional zeroes
    • | Or
    • 0+ Match 1+ times a zero to prevent matching an empty string
  • ) Close group pat
  • $ End of string

See a regex demo.

Note that I have removed some of the outer capture groups like (11) and (00) and (01*0) which are unnecessary for a match only.

If you are only interested in the named group pat you can use non capture groups (?: instead:

^(?<pat>0*(?:1(?:10)*(?:0|11)(?:01*0(?:01)*1|00)*1)+0*|0+)$

Upvotes: 2

Related Questions