Nehal Sateeshkumar
Nehal Sateeshkumar

Reputation: 33

Replace Traffic Source from raw Google analytics session data in Bigquery?

Recently we observed that when a user tries to complete a transaction on our website using an ios device. Apple ends the current session and begins a new session. The difficulty with this is that if the user came through paid source/email the current session ends and starts a new session with apple.com traffic source.

For Instance                                                         
google->appleid.apple.com                                                      
(direct)->appleid.apple.com                                              
email->appleid.apple.com                                                
ios->appleid.apple.com->appleid.apple.com->appleid.apple.com             

Since we have this raw data coming into BQ we are looking at replacing appleid.apple.com with their actual traffic Source i.e. google,direct,email,ios. Any help regarding the logic/function to workaround this problem will help?

This is the code I tried implementing:

WITH DATA AS (
SELECT 
PARSE_DATE("%Y%m%d",date) AS Date,
clientId as ClientId,
fullVisitorId AS fullvisitorid,
visitNumber  AS visitnumber,
trafficSource.medium as medium,
CONCAT(fullvisitorid,"-",CAST(visitStartTime AS STRING)) AS Session_ID,
trafficsource.source AS Traffic_Source,
MAX((CASE WHEN (hits.eventInfo.eventLabel="complete") THEN 1 ELSE 0 END)) AS ConversionComplete
FROM `project.dataset.ga_sessions_20*` 
,UNNEST(hits) AS hits
WHERE totals.visits=1 
GROUP BY 
1,2,3,4,5,6,7
),
Source_Replace AS (
SELECT 
Date AS Date,
IF(Traffic_Source LIKE "%apple.com" ,(CASE WHEN Traffic_Source NOT LIKE "%apple.com%" THEN LAG(Traffic_Source,1) OVER (PARTITION BY ClientId ORDER BY visitnumber ASC)end), Traffic_Source) AS traffic_source_1,
medium AS Medium,
fullvisitorid AS User_ID,
Session_ID AS SessionID,
ConversionComplete AS ConversionComplete
FROM 
DATA
)
SELECT 
Date AS Date,
traffic_source_1 AS TrafficSource,
Medium AS TrafficMedium,
COUNT(DISTINCT User_ID) AS Users,
COUNT(DISTINCT SessionID) AS Sessions,
SUM(ConversionComplete) AS ConversionComplete
FROM 
Source_Replace 
GROUP BY 
1,2,3

Thanks

Upvotes: 1

Views: 317

Answers (1)

saifuddin778
saifuddin778

Reputation: 7298

Does assuming the visitStartTime as key to identifying the session start help? Maybe something like:

source_replaced as (
   select *, 
      min(Traffic_Source) over (
         partition by date, clientid, fullvisitorid, visitnumber order by visitStartTime
      ) as originating_source
   from data
)

Then you can do your aggregation over the originating_source. Its kind of difficult without looking at some sample of data about whats going on.

Hope it helps.

Upvotes: 1

Related Questions