Reputation: 13
I'm using BigQuery and I have a table with nested fields which I need to flatten in order to be able to download the table. I found explanation page (https://cloud.google.com/life-sciences/docs/how-tos/flatten-bigquery-table) but I don't understand it fully.
Here's the code:
standardSQL
SELECT
reference_name, start_position, end_position, reference_bases,
call.name AS call_name
FROM
`PROJECT_ID.BIGQUERY_DATASET.BIGQUERY_TABLE` AS t,
t.call AS call
I understand why we would write "call.name AS call_name". But what I don't understand why we write "t.call AS call"? What does the letter "t" stand for here? Why do we write t.call? Why do we need need to create an alias with "AS call"?
Thank you for your help!
Upvotes: 1
Views: 2882
Reputation: 101
Try this link.
You can achieve this using unnest:
SELECT
reference_name, start_position, end_position, reference_bases,
call.name AS call_name
FROM
`PROJECT_ID.BIGQUERY_DATASET.BIGQUERY_TABLE`,
UNNEST(call) AS call
Upvotes: 1
Reputation: 173028
But what I don't understand why we write "t.call AS call"? What does the letter "t" stand for here? Why do we write t.call?
Letter t
is the alias for table in above row. In t.call
it fully qualifies path to call
inner table - call is repeated field so along with t
it is treated as a 'virtual' table that you can use in joins - in this case it is CROSS JOIN (as comma is a shortcut to CROSS JOIN)
Why do we need need to create an alias with "AS call"?
In SELECT statement there is a reference to call.name which means that you output name field from that call 'virtual table'. This is important in cases when there is another name column let's say in main table, so you avoid ambiguity error
Note: if there is no any other column named name - you can skip call as in below example
SELECT
reference_name, start_position, end_position, reference_bases,
name AS call_name
FROM
`PROJECT_ID.BIGQUERY_DATASET.BIGQUERY_TABLE` AS t,
t.call
Upvotes: 1
Reputation: 1270021
I'm baffled. "Flattening" refers to fields that are repeated -- arrays. You use unnest
to flatten them. That would be:
SELECT reference_name, start_position, end_position, reference_bases,
call.name AS call_name
FROM `PROJECT_ID.BIGQUERY_DATASET.BIGQUERY_TABLE` t CROSS JOIN
UNNEST(t.calls) AS call
This says that:
calls
is a repeated field in your table.calls
is a struct.calls
has a column called name
.I renamed the call
column calls
because it contains multiple values -- and to distinguish it from call
in the FROM
clause.
The name column is being returned in the outer query.
Here is an example you can run:
with `PROJECT_ID.BIGQUERY_DATASET.BIGQUERY_TABLE` as (
select 'a' as reference_name, 1 as start_position, 5 as end_position, 'b' as reference_bases,
array[struct('name1' as name),
struct('name2' as name),
struct('name3' as name)
] calls
)
SELECT reference_name, start_position, end_position, reference_bases,
call.name AS call_name
FROM `PROJECT_ID.BIGQUERY_DATASET.BIGQUERY_TABLE` t CROSS JOIN
UNNEST(t.calls) AS call;
Upvotes: 1