0x45
0x45

Reputation: 819

Testcase of Regex is failing

I've got the following regex:

^[EC]D_V_[a-zA-Z]{5}____([0-9]{8})_[0-9]{3}_[a-zA-Z](_[0-9]{1,7})?\.([^<>:\\”\/\\|\\*\\?]{3,4})(\.gz)?

and this testdata:

CD_V_DoSto____00000000_255_A_952086.445 
ED_V_DoSto____99999999_255_A_91459._416.gz 

Why is the second one failing, but if I edit the first file to CD_V_DoSto____00000000_255_A_952086.445.gz it's working.

I think the 0-9{8} is causing the problem, but I couldnt verify it...

Here you can test it: regex101

Upvotes: 2

Views: 98

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627086

There are three things to consider here:

  • The ^ anchor only matches the start of the string by default, to make it match the line start, you need to prepend the pattern with (?m) or use the m multiline option
  • The Group 3 pattern requires any 3 or 4 chars but those in the set, but since the {3,4} quantifier is greedy, the . before gz gets matched and .gz does not fall into Group 4. You should add . to the negated character set
  • If the whole string should match, do not forget to use $ in the regex tester. In Java matches method, you do not need to use ^ or $ to match the whole string, the match is anchored by default.

See the fixed regex fiddle:

^[EC]D_V_[a-zA-Z]{5}____([0-9]{8})_[0-9]{3}_[a-zA-Z](_[0-9]{1,7})?\.([^.<>:”\/|*?]{3,4})(\.gz)?$

In Java, you may use

Boolean isValid = s.matches("[EC]D_V_[a-zA-Z]{5}____([0-9]{8})_[0-9]{3}_[a-zA-Z](_[0-9]{1,7})?\\.([^.<>:”/|*?]{3,4})(\\.gz)?");

Upvotes: 3

Related Questions