Reputation: 11
I linked my GA4 account with a BigQuery Project a month ago.
With the raw data, I'd like to get various data&insights including many features that GA currently offers. One of them is Conversion path. I'd like to see from which mediums/sources my users went through using conversion path. For this, traffic source or medium/campaign/source values are necessary. Nevertheless, the help page of Google Bigquery Export schema says that traffic source is "Name of the traffic source that first acquired the user". https://support.google.com/firebase/answer/7029846?hl=en
So for each user, the traffic sources appear always to be same, and I can't track the path of medium/source users who made the conversion. I also tried extracting utm code of page_location in event_param column of big query schema, but it doesn't seem to be correct.
For example, the first image is value counts of the last medium in conversion path of GA4. Second one is the value counts of last medium that I extracted from page_location/page_referrer of the BigQuery data. The period is same, hence the total number of conversion is same. The counts of mediums differs though.
value counts of the last medium in conversion path of GA4
Value counts of last medium that I extracted from page_location/page_referrer of the BigQuery data
My question is how I can get the traffic source of each session in the raw data of BigQuery?
Any idea or clue would be appreciated. Many thanks.
Upvotes: 1
Views: 6950
Reputation: 11
If you're still looking for the answer, here's how I've done it:
select
event_date,
user_pseudo_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_referrer') as page_referrer,
CONCAT(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) as session_id,
CASE
WHEN ( SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') IS NULL THEN '(direct)'
ELSE ((select value.string_value from unnest(event_params) where key = 'medium' ))
END as session_medium,
CASE
WHEN ( SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') IS NULL THEN '(none)'
ELSE ((select value.string_value from unnest(event_params) where key = 'source'))
END as session_source,
CASE
WHEN ( SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'campaign') IS NULL THEN '(none)'
ELSE ((select value.string_value from unnest(event_params) where key = 'campaign'))
END as session_campaign,
CASE
WHEN ( SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'content') IS NULL THEN '(none)'
ELSE((select value.string_value from unnest(event_params) where key = 'content'))
END as session_content,
from
`ga4.analytics_308612563.events_*`
where event_name = "page_view"
and (select value.int_value from unnest(event_params) where key = 'entrances' ) = 1
So basically you're looking for that first page view by filtering for entrance = 1. For my use case, I used this query as a CTE, then did another CTE for other events I was looking for. THen joined both temp tables on the session id based on user_pseudo_id and ga_session_id.
Upvotes: 1
Reputation: 3379
Data from the GA4 export is event based, not session based, so it sounds like what you'd like to do is roll the event data up to a session content.
This query will do that:
SELECT
user_pseudo_id,
TIMESTAMP_MICROS(event_timestamp) AS session_start_ts,
CAST(LEAD(TIMESTAMP_MICROS(event_timestamp),1) OVER (PARTITION BY CONCAT(user_pseudo_id)
ORDER BY
event_timestamp) AS timestamp) AS session_end_ts,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') AS session_number,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_referrer') AS referrer,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS landing_page_path,
(SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_title') AS landing_page_title,
traffic_source.name,
traffic_source.medium,
traffic_source.source,
CASE
WHEN device.category = "desktop" THEN "desktop"
WHEN device.category = "tablet" AND app_info.id IS NULL THEN "tablet-web"
WHEN device.category = "mobile" AND app_info.id IS NULL THEN "mobile-web"
WHEN device.category = "tablet" AND app_info.id IS NOT NULL THEN "tablet-app"
WHEN device.category = "mobile" AND app_info.id IS NOT NULL THEN "mobile-app"
END AS device,
device.mobile_brand_name,
device.mobile_model_name,
device.mobile_marketing_name,
device.mobile_os_hardware_model,
device.operating_system,
device.operating_system_version,
device.vendor_id,
device.advertising_id,
device.language,
device.is_limited_ad_tracking,
device.time_zone_offset_seconds,
device.browser,
device.browser_version,
device.web_info.browser,
device.web_info.browser_version,
device.web_info.hostname
FROM
`[my_project].analytics_[my_id].events*` s -- modify to your project
WHERE
event_name = 'session_start'
order by 1,2
LIMIT 500
Full credit for this code goes to the following article on Rittman Analytics, which I think would be a very beneficial read for you when starting out with the GA4 exports to BigQuery: https://rittmananalytics.com/blog/2021/7/25/event-based-analytics-and-bigquery-export-comes-to-google-analytics-4-how-does-it-worknbsp-and-whats-thenbspcatch
Ben
Upvotes: 0