Reputation: 25
I am stuck with a specific scenario of flattening the data and need help for it. I need the output as flattened data where the column values are not fixed. Due to this I want to restrict the output to fixed set of columns.
Given Table 'test_table'
ID | Name | Property |
---|---|---|
1 | C1 | xxx |
2 | C2 | xyz |
2 | C3 | zz |
The scenario is, column Name can have any no. of values corresponding to an ID. I need to flatten the data based in such a way that there is one row per ID field. Since the Name field varies with each ID, I want to flatten it for fix 3 columns like Co1, Co2, Co3. The output should look like
ID | Co1 | Co1_Property | Co2 | Co2_Property | Co3 | Co3_Property |
---|---|---|---|---|---|---|
1 | C1 | xxx | null | null | ||
2 | C2 | xyz | C3 | zz |
Could not think of a solution using Pivot or aggregation. Any help would be appreciated.
Upvotes: 0
Views: 2440
Reputation: 172993
All current answers are too verbose and involve heavy repetition of same fragments of code again and again and if you need to account more columns you need to copy paste and add more lines which will make it even more verbose!
My preference is to avoid such type of coding and rather use something more generic as in below example
select * from (
select *, row_number() over(partition by id) col
from `project.dataset.table`)
pivot (max(name) as name, max(property) as property for col in (1, 2, 3))
If applied to sample data in your question - output is
If you want to change number of output columns - you just simply modify for col in (1, 2, 3)
part of query.
For example if you would wanted to have 5 columns - you would use for col in (1, 2, 3, 4, 5)
- that simple!!!
Upvotes: 1
Reputation: 1269823
You can use arrays:
select id,
array_agg(name order by name)[safe_ordinal(1)] as name_1,
array_agg(property order by name)[safe_ordinal(1)] as property_1,
array_agg(name order by name)[safe_ordinal(2)] as name_2,
array_agg(property order by name)[safe_ordinal(2)] as property_2,
array_agg(name order by name)[safe_ordinal(3)] as name_3,
array_agg(property order by name)[safe_ordinal(3)] as property_3
from t
group by id;
Upvotes: 1
Reputation: 86715
The standard practice is to use conditional aggregation. That is, to use CASE
expressions to pick which row goes to which column, then MAX()
to collapse multiple rows into individual rows...
SELECT
id,
MAX(CASE WHEN name = 'C1' THEN name END) AS co1,
MAX(CASE WHEN name = 'C1' THEN property END) AS co1_property,
MAX(CASE WHEN name = 'C2' THEN name END) AS co2,
MAX(CASE WHEN name = 'C2' THEN property END) AS co2_property,
MAX(CASE WHEN name = 'C3' THEN name END) AS co3,
MAX(CASE WHEN name = 'C3' THEN property END) AS co3_property
FROM
yourTable
GROUP BY
id
Background info:
ELSE
in the CASE
expression implicitly means ELSE NULL
NULL
from every input row, except for the row being pivoted into that columnMAX()
essentially skip NULL
valuesMAX( {NULL,NULL,'xxx',NULL,NULL} )
therefore equals 'xxx'
A similar approach "bunches" the values to the left (so that NULL
values always only appears to the right...)
That approach first uses row_number()
to give each row a value corresponding to which column you want to put that row in to..
WITH
sorted AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY name) AS seq_num
FROM
yourTable
)
SELECT
id,
MAX(CASE WHEN seq_num = 1 THEN name END) AS co1,
MAX(CASE WHEN seq_num = 1 THEN property END) AS co1_property,
MAX(CASE WHEN seq_num = 2 THEN name END) AS co2,
MAX(CASE WHEN seq_num = 2 THEN property END) AS co2_property,
MAX(CASE WHEN seq_num = 3 THEN name END) AS co3,
MAX(CASE WHEN seq_num = 3 THEN property END) AS co3_property
FROM
yourTable
GROUP BY
id
Upvotes: 0