David Comfort
David Comfort

Reputation: 153

BigQuery over-counting transactions

I am trying to get metrics (sessions, users, new users, entrances, bounces) as well as transactions for a specific source, a type of ad and a custom dimension. The sessions returned appear to be correct, but the transactions are duplicated. I am not sure how to mix custom dimensions at the hit level with metrics at the session level.

Any ideas how to change the query so I do not duplicate values?

#Standard-SQL
SELECT
  trafficSource.source AS source,
  trafficSource.adContent AS adContent,
  (CASE WHEN hcd.index = 10 THEN hcd.value END) Destination,
  COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING) )) Sessions,
  COUNT(totals.transactions) transactions
FROM `cc-bigquery-venture.534939.ga_sessions_*`,
UNNEST(hits) as hit,
UNNEST(hit.customdimensions) as hcd
WHERE
_TABLE_SUFFIX BETWEEN '20170101' AND '20170131'
GROUP BY
  source,
  adContent,
  Destination
HAVING 
  Destination IS NOT NULL
AND
  Destination !=''
AND 
  source = 'dfa' AND adContent = 'vacationcore'
ORDER BY
  Destination ASC

Upvotes: 0

Views: 616

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173190

Below for BigQuery Standard SQL

#StandardSQL
SELECT
  trafficSource.source AS source,
  trafficSource.adContent AS adContent,
  (
    SELECT ANY_VALUE(hcd.value) 
    FROM UNNEST(hits) AS hit,
      UNNEST(hit.customdimensions) AS hcd
    WHERE hcd.index = 10
  ) Destination,
  COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING) )) Sessions,
  SUM(totals.transactions) transactions
FROM `cc-bigquery-venture.534939.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170131'
GROUP BY source, adContent, Destination
HAVING Destination IS NOT NULL
AND Destination !=''
AND source = 'dfa' AND adContent = 'vacationcore'
ORDER BY Destination ASC

Upvotes: 2

Related Questions