James Weiss
James Weiss

Reputation: 1

Getting count and pageviews for homepage to other pages on site for every page by hour in BigQuery

I am extremely new to BigQuery and I am working on a project to identify trends by hour for users navigating from the home page to every page in our site based on GA data, I have started to piece together code for how to get from the home page to a page called "/wait-times", how can I do this for home to every other page combination on our site?

hourly_wait_times = pd.io.gbq.read_gbq('''

select 

visitStartTime_HOUR,    
visitStartTime_DAY,     
sum(waitTimes) waitTimes,   #
sum(allPageViews) allPageViews   #
from (
  SELECT
  date,
  EXTRACT(HOUR FROM 
    DATETIME(TIMESTAMP_SECONDS(
        visitStartTime),'America/New_York')) visitStartTime_HOUR,
  EXTRACT(DATE FROM 
    DATETIME(TIMESTAMP_SECONDS(
        visitStartTime),'America/New_York')) visitStartTime_DAY,
  case when h.page.pagePath = "/wait-times" then 1 else 0 end waitTimes,
  1 allPageViews,
  h.page.hostname,
  h.page.pagePath,

  LAG(h.page.pagePath, 1) 
    OVER (PARTITION BY fullVisitorId, visitId, DATE ORDER BY h.hitNumber ASC) AS PreviousPage,
  LAG(h.page.pagePath, 1) 
    OVER (PARTITION BY fullVisitorId, visitId, DATE ORDER BY h.hitNumber DESC) AS NextPage
  FROM `MY-GA-SESSION`.`ga_sessions_*`, unnest(hits) h 
  WHERE

  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 3 MONTH)) 
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  and h.type = 'PAGE'
  and h.page.hostname = 'www.myhostname'

  ) A

where PreviousPage = '/'
group by visitStartTime_DAY, visitStartTime_HOUR
order by visitStartTime_DAY, visitStartTime_HOUR
 ''',
project_id=project_id)

I would like the table to look like this, how can this be done?

+-------+------------+----------+------+-------+-----------+--+
| PageA |   PageB    |   date   | hour | count | pageviews |  |
+-------+------------+----------+------+-------+-----------+--+
| /     | /donations | 12/1/19  |    0 |     0 |       100 |  |
| /     |            |          |    1 |     4 |       254 |  |
| /     |            |          |  ... |       |           |  |
|       |            |          |   23 |     5 |       500 |  |
|       |            | ...      |      |       |           |  |
| /     |            | 3/1/19   |    0 |     1 |       183 |  |
|       |            |          |  ... |       |           |  |
| /     |            |          |      |       |           |  |
| /     |            |          |   23 |     6 |       600 |  |
| /     | /store     | 12/1/19  |    0 |     2 |       100 |  |
|       |            |          |  ... |       |           |  |
+-------+------------+----------+------+-------+-----------+--+

Upvotes: 0

Views: 208

Answers (1)

rtenha
rtenha

Reputation: 3628

I would probably structure it a bit like this.

with pages as (
  select date, visitID, visitStartTime, h.page.pagePath, h.hitNumber
  from `MY-GA-SESSION`.`ga_sessions_*`, unnest(hits) h 
  where h.type = 'PAGE'
    and _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 3 MONTH)) 
    and FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
),
temp as (
  select *,
    lead(pagePath,1) over (partition by visitID order by hitNumber asc) as nextPagePath, -- Get the next page path
    row_number() over (partition by visitID order by hitNumber asc) as pagenum -- will need later on in we want to filter to landing page only
  from pages
),
temp2 as (
  select
    pagePath as pageA,
    nextPagePath as pageB,
    EXTRACT(DATE FROM DATETIME(TIMESTAMP_SECONDS(visitStartTime),'America/New_York')) as date,
    EXTRACT(HOUR FROM DATETIME(TIMESTAMP_SECONDS(visitStartTime),'America/New_York')) as hour,
    count(*) as a_to_b_count
  from temp
  where pagePath = '/' --whatever your homepage (or pageA is)
    and pagenum = 1 -- if you want pageA as a landing page only, delete if you want all pageA->pageB visits
)
select pageA, pageB, array_agg(struct(date,hour,a_to_b_count) order by 1 asc, 2 asc) as visit_info
from temp2
group by 1,2

Note: I'm not exactly sure what you mean by pageviews in this context, my query focuses on the pageA->pageB transition. If you are looking for total pageviews associated after each A->B transition, I would wrap that in a CTE and then join it back in by visitID before aggregating.

Upvotes: 1

Related Questions