Reputation: 16860
motivation: This seems kind of terrible, but I'm trying to write string_agg
in Redshift using multiple queries, which will coalesce neighboring rows. My maximum group size isn't that big, so I think the query would only run for a few iterations. I've managed to preprocess my data in a form that's like,
key | merge index | value
a | 0 | foo
a | 0 | bar
a | 1 | baz
b | 0 | fandangle
in one step, everything with the same (key, merge_index)
should be concatenated, so we get,
key | merge index | value
a | 0 | foo, bar
a | 1 | baz
b | 0 | fandangle
I want to use first_value
and last_value
in a GROUP BY
statement like so,
SELECT key,
merge_index,
FIRST_VALUE(value) || COALESCE((', ' || NTH_VALUE(value, 2)), '')
GROUP BY key, merge_index;
but, of course, you can't do that because FIRST_VALUE
and NTH_VALUE
are window functions, not aggregate functions.
question: Why can't I use FIRST_VALUE
and friends in a GROUP BY
group?
note: It works functionally to do a SELECT DISTINCT
, omit the GROUP BY
, and use the relevant OVER (PARTITION BY key, merge_index)
windows, but I can't imagine this is efficient if it's trying to deduplicate the entire result table. I also realize I could do more preprocessing and add a column like left_or_right
which indicates which side it's trying to merge, and then use a left join. That also doesn't seem too efficient, but maybe it's not bad.
Upvotes: 4
Views: 4846
Reputation: 127
I like David's queries, but he didn't get into why:
Window functions are last part of the query to be executed, after the grouping and ordering. Because of this, a window function always outputs one value per record in the final data set. You can use aggregates inside window functions, but not window functions inside aggregates. To achieve your goal, you need to do another pass over the data set to aggregate, which is accomplished with a subquery.
Upvotes: 3
Reputation: 12485
Have you tried something like the following? This way you can avoid FIRST_VALUE()
and NTH_VALUE()
as well as aggregation:
WITH p AS (
SELECT key, merge, value
, ROW_NUMBER() OVER ( PARTITION BY key, merge ) AS rn
FROM mytable
)
SELECT p1.key, p1.merge, p1.value || p1.value || COALESCE(',' || p2.value, '')
FROM p p1 LEFT JOIN p p2
ON p1.key = p2.key
AND p1.merge = p2.merge
AND p2.rn = 2
WHERE p1.rn = 1
Please see SQL Fiddle demo here. Yes, I did use Postgres 9 for the fiddle; I couldn't get a connection on 8 (but I don't think I'm using any features of 9).
Alternately, you might use the following and avoid a self-join:
WITH p AS (
SELECT key, merge, value
, LEAD(value) OVER ( PARTITION BY key, merge ) AS next_value
, ROW_NUMBER() OVER ( PARTITION BY key, merge ) AS rn
FROM mytable
)
SELECT key, merge, value || COALESCE(',' || next_value, '')
FROM p
WHERE rn = 1
SQL Fiddle here. If you knew in advance how many values you needed to concatenate, you could make multiple calls to LEAD()
with increasing offset values (more SQL Fiddle):
WITH p AS (
SELECT key, merge, value
, LEAD(value) OVER ( PARTITION BY key, merge ) AS next_value
, LEAD(value,2) OVER ( PARTITION BY key, merge ) AS n2_value
, LEAD(value,3) OVER ( PARTITION BY key, merge ) AS n3_value
, ROW_NUMBER() OVER ( PARTITION BY key, merge ) AS rn
FROM mytable
)
Upvotes: 2