dengar81
dengar81

Reputation: 2525

"Flatten" a table's repeated field(s) and aggregate

Although I am quite used to PostgreSQL and many other SQL dialects, this thing sorta stumps me:

I have a BigQuery table that looks something like:

orders
|- orderId
|- orderStatus
|- orderLines
   |- sku
   |- price_per_item
   |- quantity

In the standard SQL I would have done:

select orderLines.sku, sum(orderLines.price_per_item * quantity)
from flatten(orders, orderLines.sku) o
where orderStatus = 'valid'

but "flatten" doesn't work in standard SQL.

So I can do something like this:

select array(select sku FROM UNNEST(orderLines)) sku, array(select price_per_item from unnest(orderLines)) revenue
from orders

However, I cannot sum this now, ie:

select array(select sku FROM UNNEST(orderLines)) sku, sum(array(select price_per_item from unnest(orderLines))) revenue
from orders
group by sku

And I tried to do this with a 'with' statement to pre-create the table. But the result is the same.

What is the right approach and why does this seem needlessly verbose?

I am kinda annoyed having to use legacy SQL, because I also use a function in a join and that only works in standard SQL.

Upvotes: 1

Views: 1028

Answers (1)

Elliott Brossard
Elliott Brossard

Reputation: 33745

If you're familiar with arrays in PostgreSQL, you may have used the UNNEST operator before. You need to use it in this case to join the array with the table itself, flattening the repetition:

select orderLine.sku, sum(orderLine.price_per_item * quantity)
from orders, UNNEST(orderLines) AS orderLine
where orderStatus = 'valid'
GROUP BY sku

(I added the GROUP BY, since it looked like that was missing). For more information on arrays, including examples using UNNEST, see the documentation. If you are used to using legacy SQL in BigQuery, there is a migration guide that describes differences in flattening between legacy and standard SQL in BigQuery among other topics.

Upvotes: 2

Related Questions