Reputation: 11
I have a BQ table like this:
ID index
A [1, 3, 4]
B [2]
C [0, 3]
D [2, 3]
I hope to get a new table with a single one-hot encoded index column with a fixed length:
ID index
A [0, 1, 0, 1, 1]
B [0, 0, 1, 0, 0]
C [1, 0, 0, 1, 0]
D [0, 0, 1, 1, 0]
There is a similar problem here: Transform table to one-hot encoding for many rows, but that converted into multiple columns instead of a single column.
Upvotes: 1
Views: 329
Reputation: 173046
Use below
#standardSQL
with temp as (
select i from (
select max(i) as max_index
from `project.dataset.table` t,
t.index i
), unnest(generate_array(0, max_index)) i
)
select id,
( select array_agg(if(i = i1, 1, 0) order by i)
from temp
left join t.index as i1
on i = i1
) index
from `project.dataset.table` t
If apply above to sample data in your question
with `project.dataset.table` as (
select 'A' id, [1, 3, 4] index union all
select 'B', [2] union all
select 'C', [0, 3] union all
select 'D', [2, 3]
)
As you can see it produces one column - array of integers
In case if you want column to be of string type - use below instead of array_agg
string_agg(if(i = i1, '1', '0') order by i)
in this case output will be (very similar)
Upvotes: 2