Reputation: 7714
I am trying to build a regex to match UK valid company naming standards based on The Company Regulations 2015 - Permitted characters. The pattern that I am trying to match:
Permitted word characters and symbols may be used in any part of the name:
A-Z a-z 0-9 ? ! & @ \ / £ $ € ¥ . , «» -
Symbols that one type per group may be used: *
" “”
- ' ‘’
- () [] {} <>
Permitted symbols only after the first 3 characters:
* = # % +
160 permitted characters max.
* To further elaborate on part 2:
(d) any other punctuation referred to in column 1 of table 2 in Schedule 1 but only in one of the forms set out opposite that punctuation in column 2 of that table.
This means that if a company name using parentheses ()
, it should not have squared brackets []
or curly brackets {}
. It should include parentheses only. Or if a company name uses “”
, it should not use "
. Likewise, if a company name uses ‘’
, it should not use '
.
Here is my Regex101 with tests that works for PCRE, JS, Py and Go:
/^[A-Za-z0-9 \"“”'‘’()[\]{}<>«»\\\/?!&@£$€¥.,-]{3}[*=#%+A-Za-z0-9 \"“”'‘’()[\]{}<>«»\\\/?!&@£$€¥.,-]{0,157}$/
This regex will fail to match part 2: one type per group rule test case.
I can solve this without regex, But I am interested to know if it is possible to do in regex only?
Upvotes: 3
Views: 1203
Reputation: 7714
This solution is based on @Paolo regex from the comments.
Regex101 with further tests:
Explanation:
Positive lookaheads are used to assert the string follows part (d) of the standard (symbols that one type per group may be used). And within, each sub-rule is separated by atomic grouping for performance.
PCRE
/^
(?=
(?> [^{}()<>]* \[+ [^{}()<>]* \]+ [^{}()<>]* ) *$|
(?> [^[\]()<>]* \{+ [^[\]()<>]* \}+ [^[\]()<>]* ) *$|
(?> [^[\]{}<>]* \(+ [^[\]{}<>]* \)+ [^[\]{}<>]* ) *$|
(?> [^[\]{}()]* \<+ [^[\]{}()]* \>+ [^[\]{}()]* ) *$|
(?> [^[\]{}()<>]* ) *$
)
(?=
(?> [^"]* \“+ [^"]* \”+ [^"]* ) *$|
(?> [^“”]* \"+ [^“”]* \"+ [^“”]* ) *$|
(?> [^"“”]* ) *$
)
(?=
(?> [^']* ‘+ [^']* ’+ [^']* ) *$|
(?> [^‘’]* '+ [^‘’]* '+ [^‘’]* ) *$|
(?> [^'‘’]* ) *$
)
[A-Za-z0-9 \"“”'‘’()[\]{}<>«»\\\/?!&@£$€¥.,-]{3}
[A-Za-z0-9 \"“”'‘’()[\]{}<>«»\\\/?!&@£$€¥.,\-*=#%+]{0,157}
$/
JavaScript and Python
JavaScript and Python do not support atomic grouping syntax. So until then, you will have to settle down with a lookahead assertion hack to emulate atomic groups that works quite well:
/^
(?=
(?=( [^{}()<>]* \[+ [^{}()<>]* \]+ [^{}()<>]* ))\1 *$|
(?=( [^[\]()<>]* \{+ [^[\]()<>]* \}+ [^[\]()<>]* ))\2 *$|
(?=( [^[\]{}<>]* \(+ [^[\]{}<>]* \)+ [^[\]{}<>]* ))\3 *$|
(?=( [^[\]{}()]* \<+ [^[\]{}()]* \>+ [^[\]{}()]* ))\4 *$|
(?=( [^[\]{}()<>]* ))\5 *$
)
(?=
(?=( [^\"]* \“+ [^\"]* \”+ [^\"]* ))\6 *$|
(?=( [^“”]* \"+ [^“”]* \"+ [^“”]* ))\7 *$|
(?=( [^\"“”]* ))\8 *$
)
(?=
(?=( [^']* ‘+ [^']* ’+ [^']* ))\9 *$|
(?=( [^‘’]* '+ [^‘’]* '+ [^‘’]* ))\10 *$|
(?=( [^'‘’]* ))\11 *$
)
[A-Za-z0-9 \"“”'‘’()[\]{}<>«»\\\/?!&@£$€¥.,-]{3}
[A-Za-z0-9 \"“”'‘’()[\]{}<>«»\\\/?!&@£$€¥.,\-*=#%+]{0,157}
$/
Hopefully, this would be useful for others.
Upvotes: 1