Indrid
Indrid

Reputation: 1192

Big Query Fill Forward

I have simple data in my BQ table:

Row make    model   cost    color
1   hhkb    hybrid  300     cream dark
2   durgod  K320    120     grey cream
3   durgod  hades   130     white black
4   rf      pf2     300     white black
5   drop    alt     200     white black silver bronze
6   idobao  id67    200     white black silver rust

What I want to get to is splitting the color column on spaces and having the data presented as denormalise; a row with all details for every color option. So it would look like this:

Row make    model   cost    color
1   hhkb    hybrid  300     cream 
2   hhkb                    dark
3   durgod  K320    120     grey 
4   durgod                  cream
5   durgod  hades   130     white 
6   durgod                  black
7   rf      pf2     300     white 
8   rf                      black
9   drop    alt     200     white 
10  drop                    black 
11  drop                    silver 
12  drop                    bronze
13  idobao  id67    200     white 
14  idobao                  black
15  idobao                  silver
16  idobao                  rust

I appreciate that whatever technique works for one colum, lets say 'make' can probably be repeated for other columns too, this is why in the example above I'm only asking about how to forward fill 'make' following a SPLIT on 'color' column.

What I have so far is:

SELECT
    make,
    model,
    cost,
    SPLIT(color, ' ') color,
    LAST_VALUE(make IGNORE NULLS) OVER (PARTITION BY color ORDER BY model ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS maker,
  
FROM
  `learnsql-xxxxxx.sampledata.keyboards`

LIMIT
  1000

But this only results in output like:

enter image description here

This makes me think that there must be something wrong in the way I am splitting the color column as its not resulting in new rows.

Would really appreciate if anyone could give me a shove in the right direction! Needless to say I am brand new to SQL.

UPDATE

After a lot of experimentation I came up with this:

WITH keybs AS (
    SELECT
        make AS make,
        model AS model,
        cost AS cost,
        SPLIT(color, ' ') color
    FROM
    `learnsql-xxxxxx.sampledata.keyboards`  
)

SELECT 
    keybs.make,
    keybs.model,
    keybs.cost,
    color
FROM keybs
    CROSS JOIN UNNEST(keybs.color) AS color
LIMIT
    1000

Which gives the required output. Is that the best way to do it?

Upvotes: 2

Views: 395

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Consider below approach

select t.* except(color), color
from `learnsql-xxxxxx.sampledata.keyboards` t
left join unnest(split(color, ' ')) color     

In addition to just be [significantly] less verbose - it uses left join instead of cross join. the reason is - if some of rows do not have colors (null value) - with cross join such row will be excluded from output, while with left join such row will be preserved

Upvotes: 1

Related Questions