Reputation: 21
I'm using big query and am trying to import custom dimensions along with noncustom dimensions. The analytics is sent from an app and basically I want a table with columns: UserID (custom dimension), platformID (custom dimension), ScreenName (basically app version of "Page name"), and date. The metric is "number of screenviews" grouped onto all of these dimensions. This is what it looks like below:
The photo of the GA report:
So, in bigquery, I could get numbers that checked out (when compared to GA report above) until I added in custom dimensions. Once I added custom dimensions, the numbers no longer made any sense.
I know that custom dimensions are nested within big query. So I made sure to use FLATTEN at first. Then I tried without flatten and got same results. The numbers make no sense (are hundreds of times larger than in GA interface).
My queries are below (one without FLATTEN and one with FLATTEN).
ps I ideally wanted to use
count(hits)
instead of
count(hits.appInfo.screenName)
But I kept getting an error when I selected hits in my subquery.
My query without flatten is below. If you could help me figure out why is it that once I add custom dimensions all data gets messed up
SELECT
date,
hits.appInfo.version,
hits.appInfo.screenName,
UserIdd,
platform,
count(hits.appInfo.screenName)
FROM (
SELECT
date,
hits.appInfo.version,
hits.appInfo.screenName,
max(case when hits.customdimensions.index = 5 then hits.customdimensions.value end) within record as UserIdd,
max(case when hits.customdimensions.index = 20 then hits.customdimensions.value end) within record as platform
FROM
TABLE_DATE_RANGE([fiery-cabinet-97820:87025718.ga_sessions_], TIMESTAMP('2017-04-04'), TIMESTAMP('2017-04-04'))
)
where UserIdd is not null
and platform = 'Android'
GROUP BY
1,
2,
3,
4,
5
ORDER BY
6 DESC
and here is my query with FLATTEN (same issue - numbers dont make sense)
SELECT
date,
hits.appInfo.version,
customDimensions.index,
customDimensions.value,
hits.appInfo.screenName,
UserIdd,
count(hits.appInfo.screenName)
FROM (FLATTEN(( FLATTEN((
SELECT
date,
hits.appInfo.version,
customDimensions.value,
customDimensions.index,
hits.appInfo.screenName,
max(case when hits.customdimensions.index = 5 then hits.customdimensions.value end) within record as UserIdd,
hits.type
FROM
TABLE_DATE_RANGE([fiery-cabinet-97820:87025718.ga_sessions_], TIMESTAMP('2017-04-04'), TIMESTAMP('2017-04-04'))), customDimensions.value)),hits.type))
WHERE
customDimensions.value = 'Android'
and customDimensions.index = 20
and UserIdd is not null
GROUP BY
1,
2,
3,
4,
5,
6
ORDER BY
7 DESC
Upvotes: 2
Views: 12847
Reputation: 715
I'm not positive that hits.customDimensions.*
will always have the user-scoped dimensions (and I'm guessing your userId metric is user-scoped).
Specifically, user-scoped dimensions should be queried from customDimensions
, not hits.customDimensions
.
Notionally, the first step is to make customDimensions
compatible with hits.*
via flattening or scoped aggregation. I'll explain the flattening approach.
GA records have the shape (customDimensions[], hits[], ...)
, which is no good for querying both fields. We begin by flattening these to (customDimensionN, hits[], ...)
.
One level up, by selecting fields under hits.*
, we implicitly flatten the table into (customDimensionN, hitN)
records. We filter these to include only the records matching (customDimension5, appviewN)
.
The last step is to count everything up.
SELECT date, v, sn, uid, COUNT(*)
FROM (
SELECT
date,
hits.appInfo.version v,
hits.appInfo.screenName sn,
customDimensions.value uid
FROM
FLATTEN((
SELECT customDimensions.*, hits.*, date
FROM
TABLE_DATE_RANGE(
[fiery-cabinet-97820:87025718.ga_sessions_],
TIMESTAMP('2017-04-04'),
TIMESTAMP('2017-04-04'))),
customDimensions)
WHERE hits.type = "APPVIEW" and customDimensions.index = 5)
GROUP BY 1,2,3,4
ORDER BY 5 DESC
Here's another equivalent approach. This uses the scoped aggregation trick that I've seen recommended in the GA BQ cookbook. Looking at the query explanation, however, the MAX(IF(...)) WITHIN RECORD
seems to be quite expensive, triggering an extra COMPUTE
and AGGREGATE
phase in the first stage. Bonus points for being a bit more digestible, though.
SELECT sn, uid, date, v, COUNT(*)
FROM (
SELECT
MAX(IF(customDimensions.index = 5, customDimensions.value, null)) within record as uid,
hits.appInfo.screenname as sn,
date,
hits.appInfo.version as v,
hits.type
FROM
TABLE_DATE_RANGE([fiery-cabinet-97820:87025718.ga_sessions_], TIMESTAMP('2017-04-04'), TIMESTAMP('2017-04-04')))
WHERE hits.type = "APPVIEW" and uid is not null
GROUP BY 1,2,3,4
ORDER BY 5 DESC
I'm not yet familiar with the Standard SQL dialect of BQ, but it seems that it would simplify this kind of wrangling. You might want to wrap your head around that if you'll be making many queries like this.
Upvotes: 7