Reputation: 187
I have some data that I am trying to aggregate (greatly simplified here). The raw data uses a schema similar to the following:
UserID - STRING
A - RECORD REPEATED
A.Action - STRING
A.Visit - INTEGER
A.Order - INTEGER
MISC - RECORD REPEATED
( other columns omitted here )
There are many actual records due to the "MISC" column, but I'm only trying to focus on the first 5 columns shown above. A sample of the raw data is shown below (note that the values shown are a sample only, many other values exist so these cannot be hard coded into the query) :
Table 0: (Raw data sample)
(empty values under UserID are as shown in BiqQuery - "A" fields are part of a nested record)
My query produces the data shown in Table 1 below. I am trying to use ARRAY_AGG with ORDINAL to select only the first two "Action"s for each user and restructure as shown in TABLE 2.
SELECT
UserId, ARRAY_AGG( STRUCT(A.Action, A.Visit, A.Order)
ORDER BY A.Visit, A.Order, A.Action )
FROM
`table`
LEFT JOIN UNNEST(A) AS A
GROUP BY
UserId
Table 1: (Sample output of above query )
Table 2: (The format needed)
So I need to:
My attempted query strategy was to ORDER BY UserID, Visit, Order and get DISTINCT values of Action using something like:
UserId,
ARRAY_AGG(DISTINCT Action ORDER BY UserID, Visit, Order) FirstAction,
ARRAY_AGG(DISTINCT Action ORDER BY UserID, Visit, Order) SecondAction
However, that approach produces the following error:
Error: An aggregate function that has both DISTINCT and ORDER BY arguments can only ORDER BY columns that are arguments to the function
Any thoughts on how to correct this error (or an alternative approach?)
Upvotes: 12
Views: 37675
Reputation: 59225
Not sure why the original query has DISTINCT
, if the results shown in table 2 don't need de-duplication.
With that said:
#standardSQL
WITH sample AS (
SELECT actor.login userid, type action
, EXTRACT(HOUR FROM created_at) visit
, EXTRACT(MINUTE FROM created_at) `order`
FROM `githubarchive.day.20171005`
)
SELECT userid, actions[OFFSET(0)] firstaction, actions[SAFE_OFFSET(1)] secondaction
FROM (
SELECT userid, ARRAY_AGG(action ORDER BY visit, `order` LIMIT 2) actions
FROM sample
GROUP BY 1
ORDER BY 1
LIMIT 100
)
Upvotes: 10
Reputation: 173046
Try below.
#standardSQL
SELECT UserId,
ARRAY_AGG(Action ORDER BY Visit, `Order`, Action LIMIT 2)[SAFE_ORDINAL(1)] AS FirstAction,
ARRAY_AGG(Action ORDER BY Visit, `Order`, Action LIMIT 2)[SAFE_ORDINAL(2)] AS SecondAction
FROM `project.dataset.table`
LEFT JOIN UNNEST(A) AS A
GROUP BY UserId
-- ORDER BY UserId
You can test / play with it using dummy data from your question
#standardSQL
WITH `table` AS (
SELECT 'U001' AS UserId, [STRUCT<Action STRING, Visit INT64, `Order` INT64 >
('Register', 1, 1),('Upgrade', 1, 2),('Feedback', 1, 3),('Share', 1, 4),('Share', 2, 1)] AS A UNION ALL
SELECT 'U002', [STRUCT<Action STRING, Visit INT64, `Order` INT64 >
('Share', 7, 1),('Share', 7, 2),('Refer', 8, 1),('Feedback', 8, 2),('Feedback', 8, 3)] UNION ALL
SELECT 'U003', [STRUCT<Action STRING, Visit INT64, `Order` INT64 >
('Register', 1, 1),('Share', 1, 2),('Share', 1, 3),('Share', 2, 1),('Share', 2, 2),('Share', 3, 1),('Share', 3, 2)]
)
SELECT UserId,
ARRAY_AGG(Action ORDER BY Visit, `Order`, Action LIMIT 2)[SAFE_ORDINAL(1)] AS FirstAction,
ARRAY_AGG(Action ORDER BY Visit, `Order`, Action LIMIT 2)[SAFE_ORDINAL(2)] AS SecondAction
FROM `table`
LEFT JOIN UNNEST(A) AS A
GROUP BY UserId
ORDER BY UserId
Upvotes: 3