Reputation: 21
I am trying to use some regex to validate some input inside of Java code. I have been successful in implementing "basic" regex, but this one seems to be out of my scope of knowledge. I am working through RegEgg tutorials to learn more.
Here are the conditions that need to be validated:
Or
This was my regex before additional validations were added:
^(\s{8}|[A-Za-z\-\!\&][ A-Za-z0-9\-\!\&]{7})$"
Then the additional validations for now allowing multiple of the special characters, and I am a bit stuck. I have been successful in using a positive lookahead, but stuck when trying to use the positive lookbehind. (I think the data before the lookbehind was consumed), but I am speculating as I am a neophyte with this part of regex.
Upvotes: 2
Views: 945
Reputation:
Its a matter of asserting the right conditions at the start of the regex.
^(?=[ ]*$|(?![ ]))(?!.*([!&-]).*(?!\1)[!&-])[a-zA-Z0-9 !&-]{8}$
see -> https://regex101.com/r/tN5y4P/1
Some discussion:
^ # Begin of text
(?= # Assert, cannot start with a space
[ ]* $ # unless it's all spaces
| (?! [ ] )
)
(?! # Assert, not mixed special chars
.*
( [!&-] ) # (1)
.*
(?! \1 )
[!&-]
)
[a-zA-Z0-9 !&-]{8} # Consume 8 valid characters from within this class
$ # End of text
Upvotes: 0
Reputation: 102822
using the or construct (a|b
) is a large part of this, and you've begun applying it, so that's a good start.
You've made the rule that it can't start with a digit; nothing in the spec says this. also, -
inside []
has special meaning, so escape it, or make sure it is first or last, because then you don't have to. That gets us to:
^(\s{8}|[A-Za-z0-9-!& -]{8})$
next up is the rule that it has to be all the same special character if used at all. Given that there are only 3 special characters, could be easier to just explicitly list them all:
^(\s{8}|[A-Za-z0-9 -]{8}|[A-Za-z0-9 !]{8}|[A-Za-z0-9 &]{8})$
Next up: Can't start with a space, and can't be all-special. Confirming the negative (that it ISNT all-special characters) gets complicated; lookahead seems like a better plan here. This:
^
is regexp-ese for: "Start of line". Note that this doesn't 'consume' a character. 1
is regexpese for 'only the exact character '1' will match here, nothinge else', but as it matches, it also 'consumes' that character, whereas ^
doesn't do that. 'start of line' is not a concept that can be consumed.
This notion of 'a match may fail, but if it succeeds, nothing is consumed' isn't limited to ^
and $
; you can write your own:
(?=abc)
will match if abc
would match at this position, but does not consume it. Thus, the regexp ^(=abc)ab.d$
would match the input string abcd
and nothing else. This is called positive lookahead. (it 'looks ahead' and matches if it sees the regular expression in the parens, failing if it does not).
(?!abc)
is negative lookahead. It matches if it DOESNT see the thing in the parens. (?!abc)a.c
will match the input adc
but not the input abc
.
(?<=abc)
is positive lookbehind. It matches if the pattern you provide would match such that the match ends at the position you find yourself.
(?<!abc)
is negative lookbehind.
Note that lookahead and lookbehind can be somewhat limited, in that they may not allow variable length patterns. But, fortunately, your requirements make it easy to limit ourselves to fixed size patterns here. Thus, we can introduce: (?![&!-]{8})
as a non-consuming unit in our regexp that will fail the match if we have all-8 special characters.
We can use this trick to fail on starting space too: (?! )
is all we need for that one.
Let's replace \s
which is whitespace with just
which is the space character (the problem description says 'space', not 'whitespace').
Putting it all together:
^( {8}|(?! )(?![&!-]{8})([A-Za-z0-9 -]{8}|[A-Za-z0-9 !]{8}|[A-Za-z0-9 &]{8}))$
Thats:
Plug em in at regex101 along with your various examples of 'legal' and 'not legal' and you can play around with it some more.
NB: You can also use backreferences to attempt to solve the 'only one special character is allowed' part of this, but attempting to tackle the 'not all special characters' part seems quite unwieldy if you don't get to use (negative) lookahead.
Upvotes: 2