Joshua Snider
Joshua Snider

Reputation: 795

Matching a string in Lex that contains at least one upper and lower case letter,

I'm trying to match a string of 4 to 8 mixed case letters that contains at least uppercase and at least one lowercase letter. I tried [a-zA-z]{4,8} but that matches strings like abba and CREEEDD which only contain lowercase or uppercase letters. Is this something that can be done in Lex or do I need to do it differently?

Upvotes: 0

Views: 1779

Answers (2)

Bryan Olivier
Bryan Olivier

Reputation: 5307

This cries for the & (and) operation in regular expressions, then the following would do the job:

((([a-zA-Z]*([a-z][a-zA-Z]*[A-Z])|([A-Z][a-zA-Z]*[a-z]))[a-zA-Z]*)&([a-zA-Z]{4,8})

but that operation doesn't exist. Of course you could enumerate all possibilities for the positions of the lowercase or the uppercase amongst the mixed case, but that would amount to an enormous expression.

Isn't it feasible to filter through all strings of 4 to 8 characters to check differently for the presence of both the lower- and uppercase? Maybe you could apply the second regular expression to the results of the former.

As a side note: there is no theoretical objection against the & operation, as deterministic finite automatons are closed under intersection, albeit that the number of states potentially explodes. It's probably that it requires major modification of the usual interpreters of non-deterministic finite automatons.

Oh and if someone feels challenged to make a difference, then don't forget about the complement either.

Upvotes: 2

Jonathan Leffler
Jonathan Leffler

Reputation: 754160

You need a string with zero or more mixed case letters, followed by an uppper case letter, zero or more mixed case letters, a lower case letter, and zero or more mixed case letters, or the similar pattern with the lower case before the upper case.

However, that's messy. So, we can try to simplify. The first character may be upper case, so we need it followed by zero or more mixed case letters, a lower case letter, and zero or more mixed case letters again. Or the first character may be lower case, so we need it followed by zero or more mixed case letter, an upper case letter, and zero or more mixed case letters again.

[a-z][a-zA-Z]*[A-Z][a-zA-Z]*|[A-Z][a-zA-Z]*[a-z][a-zA-Z]*

The residual problem is limiting the total length to the range 4-8 characters (noting that just 8 alphabetic characters is pathetic for a password these days; allow punctuation and digits and longer than 8 characters). I'd implement the length validation in the action after the pattern is recognized.

Alternatively, and probably more simply, use your existing rule:

[a-zA-Z]{4,8}

and apply the mixed-case validation in the action:

if (islower(yytext[0]) && strpbrk(yytext, "ABCDEFGHIJKLMNOPQRSTUVWXYZ") == 0)
    ...reject...
else if (isupper(yytext[0]) && strpbrk(yytext, "abcdefghijklmnopqrstuvwxyz") == 0)
    ...reject...

Upvotes: 1

Related Questions