Reputation: 250
I'm trying to create a regex in Javascript that has a limited order the characters can be placed in, but I'm having trouble getting the validation to be fully correct.
The criteria for the expression is a little complicated. The user must input strings with the following criteria:
This probably sounds a little confusing.
For example, the following examples are valid:
<C>:Cu;
<Cu>:Cv;
/_V<C>V:C;
/_VV<Cv>VV_/:Cu;
_<V>:V1;
_<V>_:V1;
_<V>/:V1;
_<V>:*;
_<m>:n;
The following are invalid:
Cu:Cv;
Cu:Cv
CuCv;
<Cu/>:Cv;
<Cu_>:Cv;
<Cu>:Cv/;
_/<Cu>:Cv;
<Cu>/_:Cv;
They should validate when grouped together like so.
<Cu>:Cv;/_V<C>V:C;_<V>:V1;_<V>/:V1;_<V>:*;_<m>:n;
Hopefully, these examples help you understand what I'm trying to match.
I created the following regexp and tested it on Regex101.com, but this is the closest I could come:
\\/{0,1}_{0,1}[A-Za-z0-9]{0,}<{1}[A-Za-z0-9]{1,2}>{1}[A-Za-z0-9]{0,}_{0,1}\\/{0,1}):([A-Za-z0-9]{1,2}|\\*;$
It's mostly correct, but it allows strings that should be invalid such as:
_/<C>:C;
If an underscore comes before the first forward-slash, it should be rejected. Otherwise, my regexp seems to be correct for all other cases.
If anyone has any suggestions on how to fix this, or knows of a way to match all criteria much more efficiently, any help is appreciated.
Upvotes: 1
Views: 1461
Reputation: 6682
Did you mean this?
/^(?:(^|\s*;\s*)(?:\/_|_)?[a-z]*<[a-z]+>[a-z]*_?\/?:(?:[a-z0-9]+|\*)(?=;))+;$/i
We start with a case-insensitive expression /.../i
to keep it more readable. You have to rewrite it to a case-sensitive expression if you only want to allow uppercase at the beginning of a word.
^
means the begin of the string. $
means the end of the string.
The whole string ends with ';'
after multiple repeatitions of the inner expression (?:...)+
where +
means 1 or more ocurrences. ;$
at the end includes the last semicolon into the result. It is not necessary for a test only, since the look-ahead already does the job.
(^|\s*;\s*)
every part is at the begin of the string or after a semicolon surrounded by arbitrary whitespaces including linefeed. Use \n
if you do not want to allow spaces and tabs.
(?:...|...)
is a non-captured alternative. ?
after a character or group is the quantifier 0/1 - none or once.
So (?:\/_|_)?
means '/', '' or nothing. Use \/?_?
if you do want to allow strings starting with a single slash as well.
[a-z]*<[a-z]+>[a-z]*
0 or more letters followed by <...> with at least one letter inside and again followed by 0 or more letters.
_?\/?:
optional '_', optional '/', mandatory : in this sequence.
(?:[a-z0-9]+|\*)
The part after the colon contains letters and numbers or the asterisk.
(?=;)
Look-ahead: Every group must be followed by a semicolon. Look-ahead conditions do not move the search position.
Upvotes: 1
Reputation: 19641
The following seems to fulfill all the criteria:
(?:^|;)(\/?_?[a-zA-Z0-9]*<(?:[a-zA-Z]|C[uv]?)>[a-zA-Z0-9]*_?\/?):([a-zA-Z0-9]+|\*)(?=;|$)
It puts each of the "groups" in a capturing group so you can access them individually.
Details:
(?:^|;)
A non-capturing group to make sure the string is either at the beginning or starts with a semicolon.
(
Start of group 1.
\/?_?
An optional forward-slash followed by an optional underscore.
[a-zA-Z0-9]*
Any letter or number - Matches zero or more.
<(?:[a-zA-Z]|C[uv]?)>
Mandatory <>
pair containing one letter or the capital letter C
followed by a lowercase u
or v
.
[a-zA-Z0-9]*
Any letter or number - Matches zero or more.
_?\/?
An optional underscore followed by an optional forward-slash.
)
End of group1.
:
Matches a colon character literally.
([a-zA-Z0-9]+|\*)
Group 2 - containing one or more numbers or letters or a single *
character.
(?=;|$)
A positive Lookahead to make sure the string is either followed by a semicolon or is at the end.
Upvotes: 2