Reputation: 6710

remove duplicates from comma separated string (Amazon Redshift)

I am using Amazon Redshift.

I have a column in that string is stored as comma separated like Private, Private, Private, Private, Private, Private, United Healthcare. I want to remove the duplicates from it using query, so the result should be Private, United Healthcare. I found some solutions obviously from Stackoverflow and came to know it is possible using regular expressions.

Hence, I have tried using:

SELECT  regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\1') AS insurances;

And

SELECT  regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\g') AS insurances;

And also some other regular expressions but seems not working. Any solution?

Upvotes: 2

Answers (3)

John Rotenstein

Reputation: 269191

Here is a User-Defined Function (UDF) for Amazon Redshift:

CREATE FUNCTION f_uniquify (s text)
  RETURNS text
IMMUTABLE
AS $$
  -- Split string by comma-space, remove duplicates, convert back to comma-separated
  return ', '.join(set(s.split(', ')))
$$ LANGUAGE plpythonu;

Testing it with:

select f_uniquify('Private, Private, Private, Private, Private, Private, United Healthcare');

Returns:

United Healthcare, Private

If the order of return values is important, then it would need some more specific code.

Upvotes: 4

kadalamittai

Reputation: 2166

Alternative Option is to try Python UDF. Simple Python function dedupes the string and return correct version.

Upvotes: 1

cske

Reputation: 2243

Try this way,

SELECT  array_agg(DISTINCT insurances) 
FROM (SELECT  regexp_split_to_table('Private, Private, Private, Private, Private, Private, United Healthcare'
              , ',\s+') AS insurances) x;

Alternative way

SELECT DISTINCT UNNEST(regexp_split_to_array('Private, Private, Private, Private, Private, Private, United Healthcare', ',\s+')) AS insurances;

Checking http://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html both will fail with redshift, none of those converts text to text[]

Upvotes: 1

remove duplicates from comma separated string (Amazon Redshift)

Answers (3)

Related Questions