Reputation: 106
This is a very general need and can help many people. It appears whenever we need to group information and summarize the grouped values as a string as follows:
person | Visit Date | Visit |
---|---|---|
First | 2020-01-01 | 1 |
First | 2020-01-02 | 0 |
First | 2020-01-03 | 1 |
First | 2020-01-04 | 1 |
First | 2020-01-05 | 0 |
Second | 2020-01-01 | 0 |
Second | 2020-01-02 | 1 |
Second | 2020-01-03 | 1 |
Second | 2020-01-04 | 0 |
Second | 2020-01-05 | 1 |
Grouping values by the person column:
person | MIN(Visit Date) | Sequence of Visits |
---|---|---|
First | 2020-01-01 | 10110 |
Second | 2020-01-01 | 01101 |
I am using Athena for this solution so I am not as free to write a procedural function.
Upvotes: 1
Views: 1721
Reputation: 12929
In the general case, you can use the combination of reduce and array_agg as follows:
WITH example_table AS
(SELECT 'First' as person, '2020-01-01' as _date, 1 as _value UNION ALL
SELECT 'First', '2020-01-02', 0 UNION ALL
SELECT 'First', '2020-01-03', 1 UNION ALL
SELECT 'Second', '2020-01-01', 1 UNION ALL
SELECT 'Second', '2020-01-02', 0)
SELECT person, min(_date), reduce(array_agg(_value ORDER BY _date), '', (s, x) -> CAST(s as varchar) || CAST(x as varchar), s -> s)
FROM example_table
GROUP BY person
The array_agg(_value ORDER BY _date)
will create ARRAY [1,0,1]
, for example, for the first
person. The reduce part is concatenating the varchar of the numbers in the above array.
For the specific output of varchar, you can also use array_join, which makes it even shorter:
WITH example_table AS
(SELECT 'First' as person, '2020-01-01' as _date, 1 as _value UNION ALL
SELECT 'First', '2020-01-02', 0 UNION ALL
SELECT 'First', '2020-01-03', 1 UNION ALL
SELECT 'Second', '2020-01-01', 1 UNION ALL
SELECT 'Second', '2020-01-02', 0)
SELECT person, min(_date), array_join(array_agg(_value ORDER BY _date),'')
FROM example_table
GROUP BY person
BTW, try to avoid using generic names such as value or date in your column name. You will most likely hit some reserved words in one language or another, and it is hard to understand the meaning of your logic.
Upvotes: 1
Reputation: 1270431
In Athena, you would use array_agg()
for this. But unfortunately, it comes with this very loud warning:
ORDER BY is not supported for aggregation functions, for example, you cannot use it within array_agg(x).
And I assume that ordering is important.
If you know that maximum number of values for each group, you can use conditional aggregation like this:
select person, min(date),
( max(case when seqnum = 1 then value else '' end) ||
max(case when seqnum = 2 then value else '' end) ||
max(case when seqnum = 3 then value else '' end) ||
max(case when seqnum = 4 then value else '' end) ||
max(case when seqnum = 5 then value else '' end)
) as values_5
from (select t.*,
row_number() over (partition by person order by date) as seqnum
from t
) t
group by person;
Upvotes: 1