Ben P
Ben P

Reputation: 3369

Ranking visits by a timestamp in StandardSQL

I am building a log of user interactions with a website, so far I have a row per visit showing the referral channel and a timestamp:

enter image description here

I'd like to rank each visit_ref by the date, so that the most recent have the highest rank and the furthest away have the lowest rank, within the date range that I query.

Here is my code so far, with channel removed to make it easier to read:

SELECT TIMESTAMP_SECONDS(visitStartTime) AS stamp, 
customDimension.value AS UserID,
CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING)) AS visit_ref,
COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING))) OVER (PARTITION BY customDimension.value) AS total_visits_in_cycle,
RANK() OVER (PARTITION BY CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING)), TIMESTAMP_SECONDS(visitStartTime) ORDER BY TIMESTAMP_SECONDS(visitStartTime)) AS visitrank,
  COUNT(DISTINCT transaction.transactionid) AS orders

FROM `xxx.xxx.ga_sessions_20*` AS t
  CROSS JOIN UNNEST(hits) AS hits
  CROSS JOIN UNNEST(t.customdimensions) AS customDimension
WHERE parse_date('%y%m%d', _table_suffix) between 
DATE_sub(current_date(), interval 3 day) and
DATE_sub(current_date(), interval 1 day)
AND customDimension.index = 2
GROUP BY 1,2,3, fullVisitorId, visitid, visitStartTime
ORDER BY UserID
LIMIT 500

In this example, by rank is always coming back as 1 as seen in the screenshot, how can I get a rank of the unique visit_ref by timestamp?

My desired output is below, where the visitrank shows 1 against the oldest visit and 3 against the newest, for this user:

2   2018-05-07 08:02:30.000 UTC 00008736-01f0-4e0e-8e3b-4dc398e5b6f8    74664051693279955771525680150   3   2   Email - CRM Campaigns   0    
3   2018-05-06 21:59:20.000 UTC 00008736-01f0-4e0e-8e3b-4dc398e5b6f8    74664051693279955771525643960   3   1   Email - CRM Campaigns   0    
4   2018-05-07 05:39:15.000 UTC 00008736-01f0-4e0e-8e3b-4dc398e5b6f8    74664051693279955771525671555   3   3   Email - CRM Campaigns   0    

RANK() OVER (PARTITION BY CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING)), TIMESTAMP_SECONDS(visitStartTime) ORDER BY TIMESTAMP_SECONDS(visitStartTime)) AS visitrank,

I am working with Google BigQuery StandardSQL.

Upvotes: 1

Views: 921

Answers (1)

Jeremy Fortune
Jeremy Fortune

Reputation: 2499

The partition window defines the subset of records that should be considered. By including TIMESTAMP_SECONDS(visitStartTime), you're setting the partition to what is always 1 record (though there might be more in your actual data on occasion) and you only ever see a rank of 1.

Also it isn't clear to me why you need to do the concat/casting in the partition definition, though perhaps there's some significant transformation occurring during this cast. I would use this:

rank() over (partition by fullVisitorId order by timestamp_seconds(visitStartTime) desc)

Upvotes: 1

Related Questions