Cogslave
Cogslave

Reputation: 2643

Regex match only if last group is exactly n characters long

I am trying to create a regex that will match a pattern, but not match if the last group is not exactly 4 characters long.

Example:
Regex: (.{1,}-)(.{1,}-)(.{1,}-)(\d{4,4})
Good input: A-AAAA-A-0001
Bad input: A-AAAA-A-00011

My Regex fails, it picks up A-AAAA-A-0001 from both inputs

Upvotes: 3

Views: 1271

Answers (3)

everag
everag

Reputation: 7672

Firstly, I'd advise you to simplify your Regex a little bit. My suggestion is:

(\w+-){3}(\d{4})

Since you have 3 groups of word characters followed by a - sign, this is pretty straightforward.

Now, in order to capture only those desired matches, if you are testing exactly those strings, you only need to add the ^ and $ delimiters.

^(\w+-){3}(\d{4})$

Please check this Regex101 link to see it in action.

Upvotes: 1

Iain Fraser
Iain Fraser

Reputation: 6728

The following regex will pick up matches from inside of a string (i.e. it will find matching substrings):

(?<=\s|^)([^\s]+-){3}(\d{4})(?=\s|$)

The following will pick up matches for the whole string only

^([^\s]+-){3}(\d{4})$

I've simplified your regex a bit, but made the assumption that you weren't using each group of characters for something.

I turned your:

(.{1,}-)(.{1,}-)(.{1,}-)

Into

([^\s]+-){3}

Which says "match anything that isn't whitespace and ends with a dash exactly 3 times. The '+' operator is shorthand for saying {1,] or "at least once".

Can we be more specific?

I suspect that you're probably only wanting to match alphanumeric values. For example, I'm doubting that $-A%^A-@-0001 is a valid match for you. If I'm right about this, you'll want to use a shorthand character class, which would make your regex look like this instead (I'm assuming your regex is case sensitive):

Match Substrings:

(?<=\s|^)([A-Za-z\d]+-){3}(\d{4})(?=\s|$)

Match Whole Strings:

^([A-Za-z\d]+-){3}(\d{4})$

A couple of pointers:

  • Instead of specifying {4,4} to say "exactly 4 times", just use {4}
  • Instead of specifying {1,} to say "1 or more times", just use +
  • (?={regex in here}) means, look ahead in the string and match the following, but don't add it to my result (it's called a positive lookahead)
  • (?<={regex in here}) means, look backwards in the string and match the following, but don't add it to my result (it's called a positive lookbehind)
  • There are also negative lookarounds that do the opposite, but I'll leave that to you to research.

Upvotes: 3

zerkms
zerkms

Reputation: 255005

Anchor your regular expression with ^ and $ for beginning and end of a string correspondingly.

If it's not the whole string then the negative lookahead assertion would help you (?!\d)

References:

Upvotes: 2

Related Questions