Sam
Sam

Reputation: 2347

Looking for another regex explanation

In my regex expression, I was trying to match a password between 8 and 16 character, with at least 2 of each of the following: lowercase letters, capital letters, and digits.

In my expression I have:

^((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,16})$

But I don't understand why it wouldn't work like this:

^((?=\d)(?=[a-z])(?=[A-Z])(?=\d)(?=[a-z])(?=[A-Z]){8,16})$

Doesnt ".*" just meant "zero or more of any character"? So why would I need that if I'm just checking for specific conditions?

And why did I need the period before the curly braces defining the limit of the password?

And one more thing, I don't understand what it means to "not consume any of the string" in reference to "?=".

Upvotes: 0

Views: 125

Answers (4)

Trott
Trott

Reputation: 70065

Your last two questions are related. The ?= (which is called a lookahead, by the way) doesn't consume any of the string, meaning that it tests a condition of the string but itself is zero-characters long. If the lookahead is true, then the matching continues, but the next part of the expression starts from where you were before you checked the lookahead.

Because all your stuff is made up of lookaheads, they all add up to zero characters in length. So, for {8,16} to match something, you need to supply the . first. .{8,16} means "8 to 16 characters, I don't care what those characters are." {8,16} without anything before it isn't a valid expression (or at least won't mean what .{8,16} means).

Regarding your first question, you need .* in each of your lookaheads because your expression starts with ^. That means "starting at the very beginning of the string" rather than "matching anywhere within the string". Since you're not trying to match only at the beginning of the string, .* allows you to have the lookaheads affect anywhere in the string.

Lastly, I'm afraid your regexp doesn't work. Because the lookaheads are zero-length, putting the same lookahead in twice as you have done will match the same thing twice. So this expression only checks if you have a single instance of each of the types of characters that you want to enforce there being two instances of. The expression you want is more like this:

^((?=.*\d.*\d)(?=.*[a-z].*[a-z])(?=.*[A-Z].*[A-Z]).{8,16})$

And that expression is equivalent to the more elegant:

^((?=(.*\d){2})(?=(.*[a-z]){2})(?=(.*[A-Z]){2}).{8,16})$

(And, giving credit where it's due, Dennis beat me to that last expression. Well done, sir.)

Upvotes: 2

Dan
Dan

Reputation: 10786

The difference between (?=.*\d) and (?=\d) is that, while they are both zero-width lookaheads, is that the former will match if there is a digit anywhere in the string (after the current location), but the latter will match only if that digit is immediately after the current location. So, that first regex looks for 8-16 characters, including one digit, lowercase, and uppercase each. The second regex requires the first character to be a digit, and a lowercase, and an uppercase, which is absurd. If you want to math two digits, then instead of (?=.*\d)(?=.*\d), do (?=.*\d.*\d).

Upvotes: 1

Dennis
Dennis

Reputation: 14477

Your expression will not work as you want it to.

Because of the lookaheads, both instances of (?=.*\d) will actually match the same digit, thus validating passwords with only one digit.

This should work:

^(?=(.*\d){2})(?=(.*[a-z]){2})(?=(.*[A-Z]){2}).{8,16}$

Upvotes: 1

Martin.
Martin.

Reputation: 10529

The problem is that this character ^ means something like 'Right on start'. It means that these specific characters SHOULD BE strictly at the start of text you're searching in, which is not what you want.

Upvotes: 1

Related Questions