Reputation: 1609
I have the following data:
WITH data as (
SELECT 18 AS value, 1 AS id, "A" AS other_value,
UNION ALL SELECT 20 AS value, 1 AS id, "B",
UNION ALL SELECT 22 AS value, 2 AS id, "C"
UNION ALL SELECT 30 AS value, 3 AS id, "A"
UNION ALL SELECT 37 AS value, 2 AS id, "B"
UNION ALL SELECT 31 AS value, 2 AS id, "C"
UNION ALL SELECT 42 AS value, 1 AS id, "D"
)
I am using the following query
select
FIRST_VALUE(id) over w1 as id
, ARRAY_AGG(value) over w1 as data
, FIRST_VALUE(other_value) over w1 as first_other_data
, LAST_VALUE(other_value) over w1 as last_other_data
from data
WINDOW w1 as (PARTITION BY id order by value ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
And I get
id data first_other_data last_other_data
1 18 A D
20
42
1 18 A D
20
42
1 18 A D
20
42
2 22 C B
31
37
2 22 C B
31
37
2 22 C B
31
37
3 30 A A
But i am getting duplicates that I don't want. I was thinking to use distinct
keyword, but bigquery do not like it. My expected result is :
id data first_other_data last_other_data
1 18 A D
20
42
2 22 C B
31
37
3 30 A A
I have found similar questions but not exactly this case. Thanks EDIT: In my attempt to simplify the scenario for this SO question I took out some essential components. I have modified this with a more accurate version of my problem.
Upvotes: 2
Views: 1952
Reputation: 172944
The easiest way to achieve your goal without changing your initial query is to wrap it with extra select as in below example
#standardSQL
SELECT ANY_VALUE(t).* FROM (
SELECT
FIRST_VALUE(id) OVER w1 AS id
, ARRAY_AGG(value) OVER w1 AS data
, FIRST_VALUE(other_value) OVER w1 AS first_other_data
, LAST_VALUE(other_value) OVER w1 AS last_other_data
FROM data
WINDOW w1 AS (PARTITION BY id ORDER BY value ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
) t
GROUP BY FORMAT('%t', t)
with result
Upvotes: 2
Reputation: 4736
I think you still want an aggregation with grouping :) Since value and other_value seem to be related (you order by value in the window to select other_value) I'd simply put them into the array, too:
WITH data as (
SELECT 18 AS value, 1 AS id, "A" AS other_value,
UNION ALL SELECT 20 AS value, 1 AS id, "B",
UNION ALL SELECT 22 AS value, 2 AS id, "C"
UNION ALL SELECT 30 AS value, 3 AS id, "A"
UNION ALL SELECT 37 AS value, 2 AS id, "B"
UNION ALL SELECT 31 AS value, 2 AS id, "C"
UNION ALL SELECT 42 AS value, 1 AS id, "D"
)
select
id,
array_agg(struct(value, other_value)) as data
from data
group by id
So, if you need it later you can write a subquery to get it - or you can add another step to do in one query with the aggregation in between:
WITH data as (
SELECT 18 AS value, 1 AS id, "A" AS other_value,
UNION ALL SELECT 20 AS value, 1 AS id, "B",
UNION ALL SELECT 22 AS value, 2 AS id, "C"
UNION ALL SELECT 30 AS value, 3 AS id, "A"
UNION ALL SELECT 37 AS value, 2 AS id, "B"
UNION ALL SELECT 31 AS value, 2 AS id, "C"
UNION ALL SELECT 42 AS value, 1 AS id, "D"
),
temp as (
select
id,
array_agg(struct(value, other_value)) as data
from data
group by id
)
select
id,
array(select value from unnest(data)) as data,
(select other_value from unnest(data) order by value ASC limit 1) first_other_data,
(select other_value from unnest(data) order by value DESC limit 1) last_other_data
from temp
Upvotes: 3