Reputation: 49
I am trying to figure out how to identify any rows in a table that contains special characters but not spaces.
The reason is I am looking at data quality of a first name field and want to identify any rows that have special characters such as (-,.etc)
The below identifies almost what I need but also brings back names with spaces (Joel Smith) these should not be included.
SELECT
first_name
FROM users
WHERE
users.first_name ~* '[^a-z0-9]'
Many thanks in advance
Upvotes: 1
Views: 6602
Reputation: 522762
Postgres regex doesn't seem to support negative lookaheads, which is one option we could have used here. Instead, we may phrase your requirement for matching using this pattern:
^[^\- ]*[^\-a-z0-9 ][^\- ]*$
This says to match any number of non space/hyphen characters, followed by a single non alpha/space/hyphen character, followed again by any number of non space/hyphen characters. This becomes the following in Postgres:
^[^\-[:space:]]*[^\-a-z0-9[:space:]][^\-[:space:]]*$
Here is a sample query:
WITH yourTable AS (
SELECT 'abc' AS col UNION ALL
SELECT 'abc ' UNION ALL
SELECT 'abc $' UNION ALL
SELECT 'abc$' UNION ALL
SELECT 'a-bc' UNION ALL
SELECT 'a-bc ' UNION ALL
SELECT 'a-bc $' UNION ALL
SELECT 'a-bc$'
)
SELECT *
FROM yourTable
WHERE col ~* '^[^\-[:space:]]*[^\-a-z0-9[:space:]][^\-[:space:]]*$';
abc$
Upvotes: 2