JoelG
JoelG

Reputation: 49

PostgreSQL Regular Expression identifying only special characters

I am trying to figure out how to identify any rows in a table that contains special characters but not spaces.

The reason is I am looking at data quality of a first name field and want to identify any rows that have special characters such as (-,.etc)

The below identifies almost what I need but also brings back names with spaces (Joel Smith) these should not be included.

SELECT
    first_name
FROM users
WHERE
    users.first_name ~* '[^a-z0-9]' 

Many thanks in advance

Upvotes: 1

Views: 6602

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522762

Postgres regex doesn't seem to support negative lookaheads, which is one option we could have used here. Instead, we may phrase your requirement for matching using this pattern:

^[^\- ]*[^\-a-z0-9 ][^\- ]*$

This says to match any number of non space/hyphen characters, followed by a single non alpha/space/hyphen character, followed again by any number of non space/hyphen characters. This becomes the following in Postgres:

^[^\-[:space:]]*[^\-a-z0-9[:space:]][^\-[:space:]]*$

Here is a sample query:

WITH yourTable AS (
    SELECT 'abc' AS col UNION ALL
    SELECT 'abc  '  UNION ALL
    SELECT 'abc $'  UNION ALL
    SELECT 'abc$'   UNION ALL
    SELECT 'a-bc'   UNION ALL
    SELECT 'a-bc  ' UNION ALL
    SELECT 'a-bc $' UNION ALL
    SELECT 'a-bc$'
)

SELECT *
FROM yourTable
WHERE col ~* '^[^\-[:space:]]*[^\-a-z0-9[:space:]][^\-[:space:]]*$';

abc$

Demo

Upvotes: 2

Related Questions