u-ways
u-ways

Reputation: 7714

Regex - match a valid company name (UK regulations)

I am trying to build a regex to match UK valid company naming standards based on The Company Regulations 2015 - Permitted characters. The pattern that I am trying to match:

  1. Permitted word characters and symbols may be used in any part of the name:

    A-Z a-z 0-9 ? ! & @ \ / £ $ € ¥ . , «» -

  2. Symbols that one type per group may be used: *

    " “” - ' ‘’ - () [] {} <>

  3. Permitted symbols only after the first 3 characters:

    * = # % +

  4. 160 permitted characters max.

* To further elaborate on part 2:

(d) any other punctuation referred to in column 1 of table 2 in Schedule 1 but only in one of the forms set out opposite that punctuation in column 2 of that table.

This means that if a company name using parentheses (), it should not have squared brackets [] or curly brackets {}. It should include parentheses only. Or if a company name uses “”, it should not use ". Likewise, if a company name uses ‘’, it should not use '.


Here is my Regex101 with tests that works for PCRE, JS, Py and Go:

/^[A-Za-z0-9 \"“”'‘’()[\]{}<>«»\\\/?!&@£$€¥.,-]{3}[*=#%+A-Za-z0-9 \"“”'‘’()[\]{}<>«»\\\/?!&@£$€¥.,-]{0,157}$/

This regex will fail to match part 2: one type per group rule test case.

I can solve this without regex, But I am interested to know if it is possible to do in regex only?

Upvotes: 3

Views: 1203

Answers (1)

u-ways
u-ways

Reputation: 7714

This solution is based on @Paolo regex from the comments.

Regex101 with further tests:

Explanation:

Positive lookaheads are used to assert the string follows part (d) of the standard (symbols that one type per group may be used). And within, each sub-rule is separated by atomic grouping for performance.

PCRE

/^
  (?=
    (?>  [^{}()<>]*   \[+  [^{}()<>]*   \]+  [^{}()<>]*   )  *$|
    (?>  [^[\]()<>]*  \{+  [^[\]()<>]*  \}+  [^[\]()<>]*  )  *$|
    (?>  [^[\]{}<>]*  \(+  [^[\]{}<>]*  \)+  [^[\]{}<>]*  )  *$|
    (?>  [^[\]{}()]*  \<+  [^[\]{}()]*  \>+  [^[\]{}()]*  )  *$|
    (?>  [^[\]{}()<>]*                                    )  *$
  )

  (?=
    (?>  [^"]*   \“+  [^"]*   \”+  [^"]*   )  *$|
    (?>  [^“”]*  \"+  [^“”]*  \"+  [^“”]*  )  *$|
    (?>  [^"“”]*                           )  *$
  )

  (?=
    (?>  [^']*   ‘+  [^']*  ’+  [^']*   )  *$|
    (?>  [^‘’]*  '+  [^‘’]* '+  [^‘’]*  )  *$|
    (?>  [^'‘’]*                        )  *$
  )

  [A-Za-z0-9 \"“”'‘’()[\]{}<>«»\\\/?!&@£$€¥.,-]{3}
  [A-Za-z0-9 \"“”'‘’()[\]{}<>«»\\\/?!&@£$€¥.,\-*=#%+]{0,157}
$/

JavaScript and Python

JavaScript and Python do not support atomic grouping syntax. So until then, you will have to settle down with a lookahead assertion hack to emulate atomic groups that works quite well:

/^
  (?=
    (?=(  [^{}()<>]*   \[+  [^{}()<>]*   \]+  [^{}()<>]*   ))\1  *$|
    (?=(  [^[\]()<>]*  \{+  [^[\]()<>]*  \}+  [^[\]()<>]*  ))\2  *$|
    (?=(  [^[\]{}<>]*  \(+  [^[\]{}<>]*  \)+  [^[\]{}<>]*  ))\3  *$|
    (?=(  [^[\]{}()]*  \<+  [^[\]{}()]*  \>+  [^[\]{}()]*  ))\4  *$|
    (?=(  [^[\]{}()<>]*                                    ))\5  *$
  )

  (?=
    (?=(  [^\"]*  \“+  [^\"]*  \”+  [^\"]*  ))\6  *$|
    (?=(  [^“”]*  \"+  [^“”]*  \"+  [^“”]*  ))\7  *$|
    (?=(  [^\"“”]*                          ))\8  *$
  )

  (?=
    (?=(  [^']*   ‘+  [^']*  ’+  [^']*   ))\9   *$|
    (?=(  [^‘’]*  '+  [^‘’]* '+  [^‘’]*  ))\10  *$|
    (?=(  [^'‘’]*                        ))\11  *$
  )

  [A-Za-z0-9 \"“”'‘’()[\]{}<>«»\\\/?!&@£$€¥.,-]{3}
  [A-Za-z0-9 \"“”'‘’()[\]{}<>«»\\\/?!&@£$€¥.,\-*=#%+]{0,157}
$/

Hopefully, this would be useful for others.

Upvotes: 1

Related Questions