Reputation: 25

flatten data in SQL based on fixed set of column

I am stuck with a specific scenario of flattening the data and need help for it. I need the output as flattened data where the column values are not fixed. Due to this I want to restrict the output to fixed set of columns.

Given Table 'test_table'

ID	Name	Property
1	C1	xxx
2	C2	xyz
2	C3	zz

The scenario is, column Name can have any no. of values corresponding to an ID. I need to flatten the data based in such a way that there is one row per ID field. Since the Name field varies with each ID, I want to flatten it for fix 3 columns like Co1, Co2, Co3. The output should look like

ID	Co1	Co1_Property	Co2	Co2_Property	Co3	Co3_Property
1	C1	xxx	null	null
2	C2	xyz	C3	zz

Could not think of a solution using Pivot or aggregation. Any help would be appreciated.

Upvotes: 0

Answers (3)

Mikhail Berlyant

Reputation: 172993

All current answers are too verbose and involve heavy repetition of same fragments of code again and again and if you need to account more columns you need to copy paste and add more lines which will make it even more verbose!
My preference is to avoid such type of coding and rather use something more generic as in below example

select * from (
  select *, row_number() over(partition by id) col
  from `project.dataset.table`)
pivot (max(name) as name, max(property) as property for col in (1, 2, 3))

If applied to sample data in your question - output is

If you want to change number of output columns - you just simply modify for col in (1, 2, 3) part of query.
For example if you would wanted to have 5 columns - you would use for col in (1, 2, 3, 4, 5) - that simple!!!

Upvotes: 1

Gordon Linoff

Reputation: 1269823

You can use arrays:

select id,
       array_agg(name order by name)[safe_ordinal(1)] as name_1,
       array_agg(property order by name)[safe_ordinal(1)] as property_1,
       array_agg(name order by name)[safe_ordinal(2)] as name_2,
       array_agg(property order by name)[safe_ordinal(2)] as property_2,
       array_agg(name order by name)[safe_ordinal(3)] as name_3,
       array_agg(property order by name)[safe_ordinal(3)] as property_3
from t
group by id;

Upvotes: 1

MatBailie

Reputation: 86715

The standard practice is to use conditional aggregation. That is, to use CASE expressions to pick which row goes to which column, then MAX() to collapse multiple rows into individual rows...

SELECT
  id,
  MAX(CASE WHEN name = 'C1' THEN name     END)  AS co1,
  MAX(CASE WHEN name = 'C1' THEN property END)  AS co1_property,
  MAX(CASE WHEN name = 'C2' THEN name     END)  AS co2,
  MAX(CASE WHEN name = 'C2' THEN property END)  AS co2_property,
  MAX(CASE WHEN name = 'C3' THEN name     END)  AS co3,
  MAX(CASE WHEN name = 'C3' THEN property END)  AS co3_property
FROM
  yourTable
GROUP BY
  id

Background info:

Not having an ELSE in the CASE expression implicitly means ELSE NULL
The intention is therefore for each column to recieve NULL from every input row, except for the row being pivoted into that column
Aggregates, such as MAX() essentially skip NULL values
MAX( {NULL,NULL,'xxx',NULL,NULL} ) therefore equals 'xxx'

A similar approach "bunches" the values to the left (so that NULL values always only appears to the right...)

That approach first uses row_number() to give each row a value corresponding to which column you want to put that row in to..

WITH
  sorted AS
(
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY name)  AS seq_num
  FROM
    yourTable
)
SELECT
  id,
  MAX(CASE WHEN seq_num = 1 THEN name     END)  AS co1,
  MAX(CASE WHEN seq_num = 1 THEN property END)  AS co1_property,
  MAX(CASE WHEN seq_num = 2 THEN name     END)  AS co2,
  MAX(CASE WHEN seq_num = 2 THEN property END)  AS co2_property,
  MAX(CASE WHEN seq_num = 3 THEN name     END)  AS co3,
  MAX(CASE WHEN seq_num = 3 THEN property END)  AS co3_property
FROM
  yourTable
GROUP BY
  id

Upvotes: 0

flatten data in SQL based on fixed set of column

Answers (3)

Related Questions