Reputation: 126

Explain the Regex mentioned

Can any one please explain the regex below, this has been used in my application for a very long time even before I joined, and I am very new to regex's.

/^.*(?=.{6,10})(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z])(?=.*\d.*\d).*$/

As far as I understand

this regex will validate - for a minimum of 6 chars to a maximum of 10 characters - will escape the characters like ^ and $

also, my basic need is that I want a regex for a minimum of 6 characters with 1 character being a digit and the other one being a special character.

Upvotes: 0

Answers (4)

JDB

Reputation: 25810

^.*(?=.{6,10})(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z])(?=.*\d.*\d).*$

^ is called an "anchor". It basically means that any following text must be immediately after the "start of the input". So ^B would match "B" but not "AB" because in the second "B" is not the first character.
.* matches 0 or more characters - any character except a newline (by default). This is what's known as a greedy quantifier - the regex engine will match ("consume") all of the characters to the end of the input (or the end of the line) and then work backwards for the rest of the expression (it "gives up" characters only when it must). In a regex, once a character is "matched" no other part of the expression can "match" it again (except for zero-width lookarounds, which is coming next).
(?=.{6,10}) is a lookahead anchor and it matches a position in the input. It finds a place in the input where there are 6 to 10 characters following, but it does not "consume" those characters, meaning that the following expressions are free to match them.
(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z]) is another lookahead anchor. It matches a position in the input where the following text contains four letters ([a-zA-Z] matches one lowercase or uppercase letter), but any number of other characters (including zero characters) may be between them. For example: "++a5b---C@D" would match. Again, being an anchor, it does not actually "consume" the matched characters - it only finds a position in the text where the following characters match the expression.
(?=.*\d.*\d) Another lookahead. This matches a position where two numbers follow (with any number of other characters in between).
.* Already covered this one.
$ This is another kind of anchor that matches the end of the input (or the end of a line - the position just before a newline character). It says that the preceding expression must match characters at the end of the string. When ^ and $ are used together, it means that the entire input must be matched (not just part of it). So /bcd/ would match "abcde", but /^bcd$/ would not match "abcde" because "a" and "e" could not be included in the match.

NOTE

This looks like a password validation regex. If it is, please note that it's broken. The .* at the beginning and end will allow the password to be arbitrarily longer than 10 characters. It could also be rewritten to be a bit shorter. I believe the following will be an acceptable (and slightly more readable) substitute:

^(?=(.*[a-zA-Z]){4})(?=(.*\d){2}).{6,10}$

Thanks to @nhahtdh for pointing out the correct way to implement the character length limit.

Upvotes: 7

Erik

Reputation: 12858

For your regex request, here is what you would use:

^(?=.{6,}$)(?=.*?\d)(?=.*?[!@#$%&*()+_=?\^-]).*

And here it is unrolled for you:

^          // Anchor the beginning of the string (password).

(?=.{6,}$) // Look ahead: Six or more characters, then the end of the string.

(?=.*?\d)  // Look ahead: Anything, then a single digit.

(?=.*?[!@#$%&*()+_=?\^-]) // Look ahead: Anything, and a special character.

.*         // Passes our look aheads, let's consume the entire string.

As you can see, the special characters have to be explicitly defined as there is not a reserved shorthand notation (like \w, \s, \d) for them. Here are the accepted ones (you can modify as you wish):

!, @, #, $, %, ^, &, *, (, ), -, +, _, =, ?

The key to understanding regex look aheads is to remember that they do not move the position of the parser. Meaning that (?=...) will start looking at the first character after the last pattern match, as will subsequent (?=...) look aheads.

Upvotes: 0

nhahtdh

Reputation: 56809

Check Cyborgx37's answer for the syntax explanation. I'll do some explanation on the meaning of the regex.

^.*(?=.{6,10})(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z])(?=.*\d.*\d).*$

The first .* is redundant, since the rest are zero-width assertions that begins with any character ., and .* at the end.

The regex will match minimum 6 characters, due to the assertion (?=.{6,10}). However, there is no upper limit on the number of characters of the string that the regex can match. This is because of the .* at the end (the .* in the front also contributes).

This (?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z]) part asserts that there are at least 4 English alphabet character (uppercase or lowercase). And (?=.*\d.*\d) asserts that there are at least 2 digits (0-9). Since [a-zA-Z] and \d are disjoint sets, these 2 conditions combined makes the (?=.{6,10}) redundant.

The syntax of .*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z] is also needlessly verbose. It can be shorten with the use of repetition: (?:.*[a-zA-Z]){4}.

The following regex is equivalent your original regex. However, I really doubt your current one and this equivalent rewrite of your regex does what you want:

^(?=(?:.*[a-zA-Z]){4})(?=(?:.*\d){2}).*$

More explicit on the length, since clarity is always better. Meaning stay the same:

^(?=(?:.*[a-zA-Z]){4})(?=(?:.*\d){2}).{6,}$

Recap:

Minimum length = 6
No limit on maximum length
At least 4 English alphabet, lowercase or uppercase
At least 2 digits 0-9

Upvotes: 2

Sam I am says Reinstate Monica

Reputation: 31184

REGEXPLANATION

/.../: slashes are often used to represent the area where the regex is defined
^: matches beginning of input string
.: this can match any character
*: matches the previous symbol 0 or more times
.{6,10}: matches .(any character) somewhere between 6 and 10 times
[a-zA-Z]: matches all characters between a and z and between A and Z
\d: matches a digit.
$: matches the end of input.

I think that just about does it for all the symbols in the regex you've posted

Upvotes: 0

Explain the Regex mentioned

Answers (4)

Related Questions