Reputation: 6710
I am using Amazon Redshift.
I have a column in that string is stored as comma separated like Private, Private, Private, Private, Private, Private, United Healthcare
. I want to remove the duplicates from it using query
, so the result should be Private, United Healthcare
. I found some solutions obviously from Stackoverflow and came to know it is possible using regular expressions.
Hence, I have tried using:
SELECT regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\1') AS insurances;
And
SELECT regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\g') AS insurances;
And also some other regular expressions but seems not working. Any solution?
Upvotes: 2
Views: 2620
Reputation: 269191
Here is a User-Defined Function (UDF) for Amazon Redshift:
CREATE FUNCTION f_uniquify (s text)
RETURNS text
IMMUTABLE
AS $$
-- Split string by comma-space, remove duplicates, convert back to comma-separated
return ', '.join(set(s.split(', ')))
$$ LANGUAGE plpythonu;
Testing it with:
select f_uniquify('Private, Private, Private, Private, Private, Private, United Healthcare');
Returns:
United Healthcare, Private
If the order of return values is important, then it would need some more specific code.
Upvotes: 4
Reputation: 2166
Alternative Option is to try Python UDF. Simple Python function dedupes the string and return correct version.
Upvotes: 1
Reputation: 2243
Try this way,
SELECT array_agg(DISTINCT insurances)
FROM (SELECT regexp_split_to_table('Private, Private, Private, Private, Private, Private, United Healthcare'
, ',\s+') AS insurances) x;
Alternative way
SELECT DISTINCT UNNEST(regexp_split_to_array('Private, Private, Private, Private, Private, Private, United Healthcare', ',\s+')) AS insurances;
Checking http://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html both will fail with redshift, none of those converts text
to text[]
Upvotes: 1