Kieran Dee
Kieran Dee

Reputation: 129

regex not working correctly when the test is fine

For my database, I have a list of company numbers where some of them start with two letters. I have created a regex which should eliminate these from a query and according to my tests, it should. But when executed, the result still contains the numbers with letters.

Here is my regex, which I've tested on https://www.regexpal.com

([^A-Z+|a-z+].*)

I've tested it against numerous variations such as SC08093, ZC000191 and NI232312 which shouldn't match and don't in the tests, which is fine.

My sql query looks like;

SELECT companyNumber FROM company_data 
WHERE companyNumber ~ '([^A-Z+|a-z+].*)' order by companyNumber desc

To summerise, strings like SC08093 should not match as they start with letters.

I've read through the documentation for postgres but I couldn't seem to find anything regarding this. I'm not sure what I'm missing here. Thanks.

Upvotes: 1

Views: 392

Answers (3)

S-Man
S-Man

Reputation: 23766

Not start with a letter could be done with

WHERE company ~ '^[^A-Za-z].*'

demo: db<>fiddle

The first ^ marks the beginning. The [^A-Za-z] says "no letter" (including small and capital letters).


Edit: Changed [A-z] into the more precise [A-Za-z] (Why is this regex allowing a caret?)

Upvotes: 2

user330315
user330315

Reputation:

Your pattern: [^A-Z+|a-z+].* means "a string where at least some characters are not A-Z" - to extend that to the whole string you would need to use an anchored regex as shown by S-Man (the group defined with (..) isn't really necessary btw)

I would probably use a regex that specifies want the valid pattern is and then use !~ instead.

where company !~ '^[0-9].*$'

^[0-9].*$ means "only consists of numbers" and the !~ means "does not match"

or

where not (company ~ '^[0-9].*$')

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627537

The ~ '([^A-Z+|a-z+].*)' does not work because this is a [^A-Z+|a-z+].* regex matching operation that returns true even upon a partial match (regex matching operation does not require full string match, and thus the pattern can match anywhere in the string). [^A-Z+|a-z+].* matches a letter from A to Z, +,|or a letter fromatoz`, and then any amount of any zero or more chars, anywhere inside a string.

You may use

WHERE companyNumber NOT SIMILAR TO '[A-Za-z]{2}%'

See the online demo

Here, NOT SIMILAR TO returns the inverse result of the SIMILAR TO operation. This SIMILAR TO operator accepts patterns that are almost regex patterns, but are also like regular wildcard patterns. NOT SIMILAR TO '[A-Za-z]{2}%' means all records that start with two ASCII letters ([A-Za-z]{2}) and having anything after (%) are NOT returned and all others will be returned. Note that SIMILAR TO requires a full string match, same as LIKE.

Upvotes: 4

Related Questions