user2951101
user2951101

Reputation: 11

How can I convert an integer array into one hot encoding in BigQuery?

I have a BQ table like this:

ID  index
A   [1, 3, 4]   
B   [2]
C   [0, 3]
D   [2, 3]

I hope to get a new table with a single one-hot encoded index column with a fixed length:

ID  index
A   [0, 1, 0, 1, 1]   
B   [0, 0, 1, 0, 0]
C   [1, 0, 0, 1, 0]
D   [0, 0, 1, 1, 0]

There is a similar problem here: Transform table to one-hot encoding for many rows, but that converted into multiple columns instead of a single column.

Upvotes: 1

Views: 329

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173046

Use below

#standardSQL
with temp as (
  select i from (
    select max(i) as max_index
    from `project.dataset.table` t,
    t.index i
  ), unnest(generate_array(0, max_index)) i
)
select id,  
  ( select array_agg(if(i = i1, 1, 0) order by i)
    from temp 
    left join t.index as i1
    on i = i1
  ) index
from `project.dataset.table` t     

If apply above to sample data in your question

with `project.dataset.table` as (
  select 'A' id, [1, 3, 4] index union all
  select 'B', [2] union all
  select 'C', [0, 3] union all
  select 'D', [2, 3] 
)       
  • output is

enter image description here

As you can see it produces one column - array of integers
In case if you want column to be of string type - use below instead of array_agg

string_agg(if(i = i1, '1', '0') order by i)    

in this case output will be (very similar)

enter image description here

Upvotes: 2

Related Questions