Reputation: 63
So been working with some data and currently have output along the lines of
Customer | Reasons
Customer1 | Answer1, Answer3, Answer2, Answer4, Answer5, Answer1, Answer3, Answer1
Is there anyway in Big Query standard sql to rid myself of duplicates within this string and end with the output below?
Customer | Reasons
Customer1 | Answer1, Answer3, Answer2, Answer4, Answer5
Thanks in advance
Upvotes: 1
Views: 4791
Reputation: 172993
While voting up Elliott's answer - wanted to add another option (BigQuery Standard SQL):
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'Customer1' customer, 'Answer1, Answer3, Answer2, Answer4, Answer5, Answer1, Answer3, Answer1' answers
)
SELECT * REPLACE(
ARRAY_TO_STRING(ARRAY(SELECT DISTINCT answer
FROM UNNEST(SPLIT(answers, ', ')) AS answer
), ', ') AS answers)
FROM `project.dataset.table`
which produces result you need
Row customer answers
1 Customer1 Answer1, Answer3, Answer2, Answer4, Answer5
In case if for some reason you would wanted to have those values ordered - you just add one line as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'Customer1' customer, 'Answer1, Answer3, Answer2, Answer4, Answer5, Answer1, Answer3, Answer1' answers
)
SELECT * REPLACE(
ARRAY_TO_STRING(ARRAY(SELECT DISTINCT answer
FROM UNNEST(SPLIT(answers, ', ')) AS answer
ORDER BY answer
), ', ') AS answers)
FROM `project.dataset.table`
with result as
Row customer answers
1 Customer1 Answer1, Answer2, Answer3, Answer4, Answer5
Note: most likely need in ordering is not relevant for particular use case in your question - it can be handy in other cases
Upvotes: 3
Reputation: 33745
Assuming I understood the question correctly, you want something like:
SELECT
(SELECT STRING_AGG(DISTINCT s, ', ')
FROM UNNEST(SPLIT(Customer1, ', ')) AS s) AS Customer1
FROM dataset.table
This splits the string on the ', '
separator, then aggregates the substrings into a new string with duplicates removed using the DISTINCT
keyword.
Upvotes: 6