Andy Rojas
Andy Rojas

Reputation: 13

Flatten an array on bigquery

I'm using BigQuery and I have a table with nested fields which I need to flatten in order to be able to download the table. I found explanation page (https://cloud.google.com/life-sciences/docs/how-tos/flatten-bigquery-table) but I don't understand it fully.

Here's the code:

standardSQL

SELECT
      reference_name, start_position, end_position, reference_bases,
      call.name AS call_name
FROM
      `PROJECT_ID.BIGQUERY_DATASET.BIGQUERY_TABLE` AS t,
      t.call AS call

I understand why we would write "call.name AS call_name". But what I don't understand why we write "t.call AS call"? What does the letter "t" stand for here? Why do we write t.call? Why do we need need to create an alias with "AS call"?

Thank you for your help!

Upvotes: 1

Views: 2882

Answers (3)

user_a27
user_a27

Reputation: 101

Try this link.

You can achieve this using unnest:

SELECT
  reference_name, start_position, end_position, reference_bases,
  call.name AS call_name
FROM
  `PROJECT_ID.BIGQUERY_DATASET.BIGQUERY_TABLE`,
  UNNEST(call) AS call

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173028

But what I don't understand why we write "t.call AS call"? What does the letter "t" stand for here? Why do we write t.call?

Letter t is the alias for table in above row. In t.call it fully qualifies path to call inner table - call is repeated field so along with t it is treated as a 'virtual' table that you can use in joins - in this case it is CROSS JOIN (as comma is a shortcut to CROSS JOIN)

Why do we need need to create an alias with "AS call"?

In SELECT statement there is a reference to call.name which means that you output name field from that call 'virtual table'. This is important in cases when there is another name column let's say in main table, so you avoid ambiguity error
Note: if there is no any other column named name - you can skip call as in below example

SELECT
      reference_name, start_position, end_position, reference_bases,
      name AS call_name
FROM
      `PROJECT_ID.BIGQUERY_DATASET.BIGQUERY_TABLE` AS t,
      t.call  

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270021

I'm baffled. "Flattening" refers to fields that are repeated -- arrays. You use unnest to flatten them. That would be:

SELECT reference_name, start_position, end_position, reference_bases,
       call.name AS call_name
FROM `PROJECT_ID.BIGQUERY_DATASET.BIGQUERY_TABLE` t CROSS JOIN
      UNNEST(t.calls) AS call

This says that:

  • calls is a repeated field in your table.
  • calls is a struct.
  • calls has a column called name.

I renamed the call column calls because it contains multiple values -- and to distinguish it from call in the FROM clause.

The name column is being returned in the outer query.

Here is an example you can run:

with `PROJECT_ID.BIGQUERY_DATASET.BIGQUERY_TABLE` as (
       select 'a' as reference_name, 1 as start_position, 5 as end_position, 'b' as reference_bases,
              array[struct('name1' as name),
                    struct('name2' as name),
                    struct('name3' as name)
                   ] calls                   
      )
SELECT reference_name, start_position, end_position, reference_bases,
       call.name AS call_name
FROM `PROJECT_ID.BIGQUERY_DATASET.BIGQUERY_TABLE` t CROSS JOIN
      UNNEST(t.calls) AS call;

Upvotes: 1

Related Questions