Reputation: 27
I have been trying to convert a legacy SQL BigQuery code to Standard SQL, but I keep getting loads of errors.
Here is the original Legacy SQL:
SELECT t.page_path,
t.second_page_path,
t.third_page_path,
t.fourth_page_path,
CONCAT(t.page_path,IF(t.second_page_path IS NULL,"","-"),
IFNULL(t.second_page_path,""),IF(t.third_page_path IS NULL,"","-"),
IFNULL(t.third_page_path,""),IF(t.fourth_page_path IS NULL,"","-"),
IFNULL(t.fourth_page_path,"")) AS full_page_journey,
count(sessionId) AS total_sessions
FROM (
SELECT
CONCAT(fullVisitorId,"-",STRING(visitStartTime)) AS sessionId,
hits.hitNumber,
hits.page.pagePath AS page_path,
LEAD(hits.page.pagePath) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS second_page_path,
LEAD(hits.page.pagePath,2) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS third_page_path,
LEAD(hits.page.pagePath,3) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS fourth_page_path
FROM
TABLE_DATE_RANGE( [xxxxxxx:xxxxxxx.ga_sessions_],
TIMESTAMP('2017-01-01'), TIMESTAMP('2017-01-02') )
WHERE
hits.type="PAGE"
) t
WHERE t.hits.hitNumber=1
GROUP BY t.page_path,
t.second_page_path,
t.third_page_path,
t.fourth_page_path,
full_page_journey
ORDER BY total_sessions DESC
UPDATED (Edited):And here is what I have been able to do so far:
SELECT t.page_path,
t.second_page_path,
t.third_page_path,
t.fourth_page_path,
CONCAT(t.page_path,IF(t.second_page_path IS NULL,"","-"),
IFNULL(t.second_page_path,""),IF(t.third_page_path IS NULL,"","-"),
IFNULL(t.third_page_path,""),IF(t.fourth_page_path IS NULL,"","-"),
IFNULL(t.fourth_page_path,"")) AS full_page_journey,
count(sessionId) AS total_sessions
FROM (
SELECT
CONCAT(fullVisitorId,"-",cast(visitStartTime as string)) AS sessionId,
hits.hitNumber,
hits.page.pagePath AS page_path,
LEAD(hits.page.pagePath) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS second_page_path,
LEAD(hits.page.pagePath,2) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS third_page_path,
LEAD(hits.page.pagePath,3) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS fourth_page_path
FROM
`xxxxxxxxxxx.xxxxxxx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -16 DAY))AND
FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY))AND
hits.type = 'PAGE' ) AS t
WHERE t.hits.hitNumber = 1
GROUP BY t.page_path,
t.second_page_path,
t.third_page_path,
t.fourth_page_path,
full_page_journey
ORDER BY total_sessions DESC
It will be great if someone can help spot out what is wrong with the syntax.
Some of the errors gotten include:
Cannot access field hitNumber on a value with type ARRAY
Issues with "_TABLE_SUFFIX" which I read had to do with the wildcard.
Upvotes: 0
Views: 1933
Reputation: 1946
As a starting point, DATE_ADD needs a date but you're giving it a timestamp and the _TABLE_SUFFIX needs a string but you're giving it a date (kind of).
Try using CURRENT_DATE() and FORMAT_DATE around your existing syntax:
FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -16 DAY))
This question might be useful for the hitNumber error:
query-hits-and-custom-dimensions-in-the-bigquery
Try using CTE rather than a subquery as it makes things clearer and easier to debug.
WITH CTE AS
(SELECT
CONCAT(fullVisitorId,"-",cast(visitStartTime as string)) AS sessionId,
hits.hitNumber as hitNumber,
hits.page.pagePath AS page_path,
LEAD(hits.page.pagePath) OVER (PARTITION BY fullVisitorId, visitStartTime
ORDER BY hits.hitNumber) AS second_page_path,
LEAD(hits.page.pagePath,2) OVER (PARTITION BY fullVisitorId, visitStartTime
ORDER BY hits.hitNumber) AS third_page_path,
LEAD(hits.page.pagePath,3) OVER (PARTITION BY fullVisitorId,
visitStartTime ORDER BY hits.hitNumber) AS fourth_page_path
FROM
`xxxxxxxxxxx.xxxxxxx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -16 DAY))AND
FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY))AND
hits.type = 'PAGE' )
SELECT page_path,
second_page_path,
third_page_path,
fourth_page_path,
CONCAT(page_path,IF(second_page_path IS NULL,"","-"),
IFNULL(second_page_path,""),IF(third_page_path IS NULL,"","-"),
IFNULL(third_page_path,""),IF(fourth_page_path IS NULL,"","-"),
IFNULL(fourth_page_path,"")) AS full_page_journey,
count(sessionId) AS total_sessions
FROM CTE
WHERE hitNumber = 1
GROUP BY page_path,
second_page_path,
third_page_path,
fourth_page_path,
full_page_journey
ORDER BY total_sessions DESC
Upvotes: 3