Reputation: 487
I have a massive dataset with this schema:
Customer INTEGER
CategoryID INTEGER
CategoryName STRING
ProjectStage INTEGER
NextStepID INTEGER
NextStepName STRING
NextStepIsAnchor BOOLEAN
I heed to get the resulting set where each customer will be only on one row and his/her next steps will be in the columnts like this:
Customer | CategoryID | CategoryName | ProjectStage | NextStep1ID | NextStep1Name | NextStep2ID | NextStep2Name | ... etc.
I tried to play with NTH function of BigQuery but it works only for the first occurrence of the NextStepID:
SELECT
customer,
nth(1, NextStepID)
FROM [2015_05.customers_wunique_nextsteps]
group by customer
but when I try to add more columns:
SELECT
customer,
nth(1, NextStepID),
nth(2, NextStepID)
FROM [2015_05.customers_wunique_nextsteps]
group by customer
I get this error:
Error: Function 'NTH(2, [NextStepID])' cannot be used in a distributed query, this function can only be correctly computed for queries that run on a single node.
Any ideas? Now I "pivot" the results with Excel and small VBA script, but when datasets grow bigger calculation time exceeds all limits...
Thanks in advance! :)
Upvotes: 2
Views: 1039
Reputation: 14004
Function NTH is applicable to REPEATED fields, where it chooses the nth repeating element (the error message can be improved). So first step would be to build REPEATED field out of NextStepID, and it can be done with NEST aggregation function. Then you can use NTH as scoped aggregation function:
SELECT
Customer,
NTH(1, NextStepID) WITHIN RECORD AS NextStepID1,
NTH(2, NextStepID) WITHIN RECORD AS NextStepID2,
NTH(3, NextStepID) WITHIN RECORD AS NextStepID3
FROM (
SELECT Customer, NEST(NextStepID) AS NextStepID
FROM [2015_05.customers_wunique_nextsteps] GROUP BY Customer)
Upvotes: 4