Query to remove "duplicate" rows using regexp

Question

I am using PostgreSQL. I have a table keywords:

# Table name: keywords
#
#  id         :integer not null, primary key
#  text       :string  not null
#  match_type :string  not null
#  adgroup_id :integer not null

Table has a uniq index USING btree (match_type, adgroup_id, text)

Now, issue is that for same adgroup_id and match_type there are texts like "Hello" and " Hello" or "Hello " or " Hello " (note the leading/trailing whitespaces). The issue is that text column contains those spaces in the beginning and end of string causing bad data (which would not have passed the uniq index without those whitespaces).

I am planning on adding a white-space trimming before insertion in the future, but first I need to clean up the data.

How do I remove the "duplicate" data leaving the unique ones (based on the string comparison without leading and trailing spaces)?

S-Man · Accepted Answer

demo:db<>dbfiddle (example contains two groups: "Hello" without an element without whitespace; "Bye" contains two elements without whitespaces)

DELETE FROM keywords
WHERE id NOT IN (
    SELECT DISTINCT ON (trim(text))                 --1
        id
    FROM
        keywords
    ORDER BY 
        trim(text), 
        text = trim(text) DESC                   --2
)

Grouping on trimmed texts.
Order by trimmed texts and the information if the text is the one without whitespace. If there is one element then it will be ordered first and taken by the DISTINCT ON clause. If there is none another element will be taken

The solution containing the additional columns:

    DELETE FROM keywords
    WHERE id NOT IN (
        SELECT DISTINCT ON (match_type, adgroup_id, trim(text))
            id
        FROM
            keywords
        ORDER BY 
            match_type,
            adgroup_id,
            trim(text), 
            text = trim(text) DESC
    )

Query to remove "duplicate" rows using regexp

Answers (2)

Related Questions

Query to remove &quot;duplicate&quot; rows using regexp

Answers (2)

Related Questions

Query to remove "duplicate" rows using regexp