superted
superted

Reputation: 325

regex underscore delimited pattern matching

Hi I am struggling with getting the regex right for the pattern matching.

I basically want to use regex to match the following pattern.

[anyCharacters]_[anyCharacters]_[anyCharacters]_[anyCharacters]_[1or2]

for example, the below string should match to the above pattern. AA_B_D_ test-adf123_1

i tried the below regex but doesn't work .....

^[.]+_[.]+_[.]+_[.]+_(1|2)

Upvotes: 2

Views: 3035

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626826

Use a [^_] negated character class rather than [.] that only matches a dot symbol:

^[^_]+_[^_]+_[^_]+_[^_]+_[12]

If the pattern must match the whole string, add $:

^[^_]+_[^_]+_[^_]+_[^_]+_[12]$

Also, you may shorten it a bit with a limiting quantifier:

^[^_]+(?:_[^_]+){3}_[12]$

See the regex demo.

Note that [12] is a better way to match single chars, it will match 1 or 2. A grouping construct like (...) (or (?:...), a non-capturing variant) should be used when matching multicharacter values.

Pattern details:

  • ^ - start of string
  • [^_]+ - 1 or more chars other than _
  • (?:_[^_]+){3} - 3 occurrences of:
    • _ - an underscore
    • [^_]+ - 1 or more chars other than _
  • _ - an underscore
  • [12] - 1 or 2
  • $ - end of string.

Upvotes: 1

Nahuel Fouilleul
Nahuel Fouilleul

Reputation: 19315

. matches any character (once) _ included

.* matches any character (largest sequence) (_ included)

[.]+ matches only . character (at least one) (largest sequence)

[^_]+ matches any character except _ (at least one) (largest sequence)

.*? matches any character (shortest sequence)

you may need one of the last two.

^[^_]+_[^_]+_[^_]+_[^_]+_(1|2)

or

^(.*?_){4}[12]

The problem with .*? is that it can backtrack and matches also

one_two_three_four_five_1

The shortest is

^([^_]+_){4}[12]

Upvotes: 3

ernest_k
ernest_k

Reputation: 45319

Try

^(.+_)+(1|2)$

If you want to specify the number of occurrences:

^(.+_){4}(1|2)$

Upvotes: 1

Related Questions