Reputation: 3108
Im working with AWS Athena to concat a few rows to a single row.
xid pid sequence text
1 1 0 select * from
1 1 1 mytbl
1 1 2
2 1 0 update test
2 1 1 set mycol=
2 1 2 'a';
So want to contact the text column.
xid pid text
1 1 select * from mytbl
2 1 update test set mycol='a';
I ran the following query to partition it first with proper order and do the concat.
with cte as
(SELECT
xid,
pid,
sequence,
text,
row_number()
OVER (PARTITION BY xid,pid
ORDER BY sequence) AS rank
FROM unload
GROUP BY xid,pid,sequence,text
)
SELECT
xid,
pid,
array_join(array_agg(text),'') as text
FROM cte
GROUP BY xid,pid
But if you see the below output the order got misplaced.
xid pid text
1 1 mytblselect * from
2 1 update test'a'; set mycol=
I checked the Presto documentation, the latest version supports order by in array agg, but Athena is using Presto 0.172, so Im not sure whether it is supported or not.
What is the workaround for this in Athena?
Upvotes: 4
Views: 3088
Reputation: 41
One approach:
text
text
WITH cte AS (
SELECT
xid, pid, text
-- create a sortable 19-digit ranking string
, SUBSTR(
LPAD(
CAST(
ROW_NUMBER() OVER (PARTITION BY xid, pid ORDER BY sequence)
AS VARCHAR)
, 19
, '0')
, -19) AS SEQ_STR
FROM unload
)
SELECT
xid, pid
-- make sortable string, aggregate into array
-- then sort array, revert each element to original text
-- finally combine array elements into one string
, ARRAY_JOIN(
TRANSFORM(
ARRAY_SORT(
ARRAY_AGG(SEQ_STR || text))
, combined -> SUBSTR(combined, 1 + 19))
, ' '
, '') AS TEXT
FROM cte
GROUP BY xid, pid
ORDER BY xid, pid
This code assumes:
xid
+ pid
+ sequence
is unique for all input recordsxid
+ pid
+ sequence
(eg, not more than 20 million)Upvotes: 2