Reputation: 33
Recently we observed that when a user tries to complete a transaction on our website using an ios device. Apple ends the current session and begins a new session. The difficulty with this is that if the user came through paid source/email the current session ends and starts a new session with apple.com traffic source.
For Instance
google->appleid.apple.com
(direct)->appleid.apple.com
email->appleid.apple.com
ios->appleid.apple.com->appleid.apple.com->appleid.apple.com
Since we have this raw data coming into BQ we are looking at replacing appleid.apple.com with their actual traffic Source i.e. google,direct,email,ios. Any help regarding the logic/function to workaround this problem will help?
This is the code I tried implementing:
WITH DATA AS (
SELECT
PARSE_DATE("%Y%m%d",date) AS Date,
clientId as ClientId,
fullVisitorId AS fullvisitorid,
visitNumber AS visitnumber,
trafficSource.medium as medium,
CONCAT(fullvisitorid,"-",CAST(visitStartTime AS STRING)) AS Session_ID,
trafficsource.source AS Traffic_Source,
MAX((CASE WHEN (hits.eventInfo.eventLabel="complete") THEN 1 ELSE 0 END)) AS ConversionComplete
FROM `project.dataset.ga_sessions_20*`
,UNNEST(hits) AS hits
WHERE totals.visits=1
GROUP BY
1,2,3,4,5,6,7
),
Source_Replace AS (
SELECT
Date AS Date,
IF(Traffic_Source LIKE "%apple.com" ,(CASE WHEN Traffic_Source NOT LIKE "%apple.com%" THEN LAG(Traffic_Source,1) OVER (PARTITION BY ClientId ORDER BY visitnumber ASC)end), Traffic_Source) AS traffic_source_1,
medium AS Medium,
fullvisitorid AS User_ID,
Session_ID AS SessionID,
ConversionComplete AS ConversionComplete
FROM
DATA
)
SELECT
Date AS Date,
traffic_source_1 AS TrafficSource,
Medium AS TrafficMedium,
COUNT(DISTINCT User_ID) AS Users,
COUNT(DISTINCT SessionID) AS Sessions,
SUM(ConversionComplete) AS ConversionComplete
FROM
Source_Replace
GROUP BY
1,2,3
Thanks
Upvotes: 1
Views: 317
Reputation: 7298
Does assuming the visitStartTime
as key to identifying the session start help? Maybe something like:
source_replaced as (
select *,
min(Traffic_Source) over (
partition by date, clientid, fullvisitorid, visitnumber order by visitStartTime
) as originating_source
from data
)
Then you can do your aggregation over the originating_source
. Its kind of difficult without looking at some sample of data about whats going on.
Hope it helps.
Upvotes: 1