Reputation: 1292
I have a SQL Query containing a
REGEXP_REPLACE(LISTAGG(foo.name, ',') WITHIN GROUP (ORDER BY foo.name), '([^,]+)(,\1)+', '\1') AS bar
in its SELECT
. LISTAGG
concatenates the values of the foo.name column with a ',' as separator, whereas REGEXP_REPLACE
replaces duplicates.
When I change the LISTAGG
so that a ", " (comma followed by a whitespace character) is used as separator, how do I have to adjust the REGEXP_REPLACE
?
Edit:
When changing the REGEXP_REPLACE
to '([^,]+)(, \1)+'
it seems to work and I get
CITRONENSÄURE, KALIUMSORBAT, PEKTIN
But when changing the REGEXP_REPLACE
to '^([^,]+)(, \1)+$'
I still have duplicates:
CITRONENSÄURE, CITRONENSÄURE, CITRONENSÄURE, CITRONENSÄURE, CITRONENSÄURE, KALIUMSORBAT, KALIUMSORBAT, KALIUMSORBAT, KALIUMSORBAT, KALIUMSORBAT, PEKTIN, PEKTIN, PEKTIN, PEKTIN, PEKTIN
So, is '([^,]+)(, \1)+'
the right one, regardless of having no ^
and $
?
Upvotes: 0
Views: 703
Reputation: 6521
Exactly what you did in LISTAGG
, add a space after the comma
'([^,]+)(, \1)+'
Though you should make sure it matches the whole string, anchoring at the beggining of the string (^
) and at the end ($
).
'^([^,]+)(, \1)+$'
Edit:
As for the regex syntax, to answer the edit in your question, if you want to remove duplicates within the same line, you have to make sure it's matching the entire value. For example, '([^,]+)(, \1)+'
will match in 'fo[o], [o]ther'
. So we'll match the trailing comma as well (or end of string).
([^,]+), (\1(, |$))+
This expression is much safer than the one you were trying to use. However, there are a few cases where it could fail.
select regexp_replace(
listagg("name", ', ') within group (order by "name")
,'([^,]+), (\1(, |$))+', '\1\3')
as "bar"
from foo;
When could it fail?
The expression is not 100% safe because the first word is not anchored, and thus it can match at the middle of a value. For example, it will remove WORD
from the list:
'AWORD, WORD, XXX' ==> 'AWORD, XXX'
====--^^^^--
I don't think there's a way to avoid this, since Oracle implements POSIX EREs and does not support lookarrounds, word boundaries nor \G
assertions. Moreover, this is not the proper way to remove duplicate values. With really long tables, you'll end up in ORA-01489: result of string concatenation is too long
.
I'd recommend using GROUP BY
or DISTINCT
for that case. You can read about it in this article: SQL/mysql - Select distinct/UNIQUE but return all columns?.
Upvotes: 1