How can I pivot dataset in Google BigQuery?

Question

I have a massive dataset with this schema:

Customer    INTEGER
CategoryID  INTEGER
CategoryName    STRING
ProjectStage    INTEGER
NextStepID  INTEGER
NextStepName    STRING
NextStepIsAnchor    BOOLEAN

I heed to get the resulting set where each customer will be only on one row and his/her next steps will be in the columnts like this:

I tried to play with NTH function of BigQuery but it works only for the first occurrence of the NextStepID:

SELECT 
customer, 
nth(1, NextStepID)
FROM [2015_05.customers_wunique_nextsteps] 
group by customer

but when I try to add more columns:

SELECT 
customer, 
nth(1, NextStepID),
nth(2, NextStepID)
FROM [2015_05.customers_wunique_nextsteps] 
group by customer

I get this error:

Error: Function 'NTH(2, [NextStepID])' cannot be used in a distributed query, this function can only be correctly computed for queries that run on a single node.

Any ideas? Now I "pivot" the results with Excel and small VBA script, but when datasets grow bigger calculation time exceeds all limits...

Thanks in advance! :)

Mosha Pasumansky · Accepted Answer

Function NTH is applicable to REPEATED fields, where it chooses the nth repeating element (the error message can be improved). So first step would be to build REPEATED field out of NextStepID, and it can be done with NEST aggregation function. Then you can use NTH as scoped aggregation function:

SELECT
  Customer,
  NTH(1, NextStepID) WITHIN RECORD AS NextStepID1,
  NTH(2, NextStepID) WITHIN RECORD AS NextStepID2,
  NTH(3, NextStepID) WITHIN RECORD AS NextStepID3
FROM (
SELECT Customer, NEST(NextStepID) AS NextStepID
FROM [2015_05.customers_wunique_nextsteps] GROUP BY Customer)

How can I pivot dataset in Google BigQuery?

Answers (1)

Related Questions