djsf
djsf

Reputation: 73

What is the purpose of .* in a Python lookahead regex?

I am learning about regular expressions, and I found an interesting and helpful page on using them for password input validation here. The question I have is about the .* in the following expression:

"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)[a-zA-Z\d]{8,}$"

I understand that .* is a wildcard character representing any amount of text (or no text) but I'm having trouble wrapping my head around its purpose in these lookahead expressions. Why are these necessary in order to make these lookaheads function as needed?

Upvotes: 3

Views: 1646

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476574

Lookahead means direct lookahead. So if you write:

(?=a)

it means that the first character should be a. Sometimes, for instance with password checking, you do not want that. You want to express that somewhere there should be an a. So:

(?=.*a)

means that the first character can for instance be a b, 8 or @. But that eventually there should be an a somewhere.

Your regex thus means:

^               # start a match at the beginning of the string
(?=.*[a-z])     # should contain at least one a-z character
(?=.*[A-Z])     # should contain at least one A-Z character
(?=.*\d)        # should contain at least one digit
[a-zA-Z\d]{8,}  # consists out of 8 or more characters and only A-Za-z0-9
$               # end the match at the end of the string

Without the .*, there could never be a match, since:

 "^(?=[a-z])(?=[A-Z])(?=\d)[a-zA-Z\d]{8,}$"

means:

^               # start a match at the beginning of the string
(?=[a-z])       # first character should be an a-z character
(?=[A-Z])       # first character should be an A-Z character
(?=\d)          # first character should be a digit
[a-zA-Z\d]{8,}  # consists out of 8 or more characters and only A-Za-z0-9
$               # end the match at the end of the string

Since there is no character that is both an A-Z character and a digit at the same time. This would never be satisfied.

Side notes:

  1. we do not capture in the lookahead so greedyness does not matter;
  2. the dot . by default does not match the new line character;
  3. even if it did the fact that you have a constraint ^[A-Za-z0-9]{8,}$ means that you only would validate input with no new line.

Upvotes: 3

Related Questions