Alan Wayne
Alan Wayne

Reputation: 5384

How to remove all special characters and extra spaces between words with Regex in PostgreSQL

In migrating a text string from one database to another, I need to eliminate all special characters and keep only one space between words.

Unfortunately, the below code eliminates all spaces between words -- not what I want.

Here is the code I have. The "sig" field is wrong (it removes all spaces without leaving one space). Where did I go wrong?
TIA

WITH dbl_medications AS (
    SELECT * 
    FROM dblink('select medname, sig, form from medications')
    AS t1(medname text, sig text, form text)
    ORDER BY medname, form, sig
    )
INSERT INTO medications (medname, sig, form)    
    SELECT REGEXP_REPLACE(LOWER(REGEXP_REPLACE(medname,'[^a-zA-Z0-9 /-]','','g')), '^ +| +$| +(?= )', '', 'g'),
           REGEXP_REPLACE(LOWER(REGEXP_REPLACE(sig,'[^0-9a-zA-Z:/]',' ','g')), '^ +| +$| +(?= )', '', 'g'),
           LOWER(REGEXP_REPLACE(form,'[^a-zA-Z]','','g'))         
    FROM dbl_medications
    ORDER BY 1,3,2
ON CONFLICT (medname, sig, form) DO NOTHING;

Upvotes: 2

Views: 1493

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

You can use

REGEXP_REPLACE(LOWER(REGEXP_REPLACE(sig,'[^[:alnum:][:space:]:/]+',' ','g')), '^[[:space:]]+|[[:space:]]+$|[[:space:]]+(?=[[:space:]])', '', 'g')

The first [^[:alnum:][:space:]:/]+ regex is used to replace chunks of one or more chars other than alphanumeric, whitespace, : and / chars with a single space.

The ^[[:space:]]+|[[:space:]]+$|[[:space:]]+(?=[[:space:]]) regex is used to remove leading (^[[:space:]]+) and trailing ([[:space:]]+$) whitespaces, and remove excessive whitespace ([[:space:]]+(?=[[:space:]])).

Upvotes: 1

Related Questions