amysong
amysong

Reputation: 143

Transpose nested rows into columns in bigquery with google analytics data

I am interested in pulling visitors with custom dimension attributes, where each row is a unique fullvisitorid and columns are desired customdimension.values.

Using the london helmets as an example, here I am pulling visitors with the two custom dimensions I am interested in:

SELECT fullvisitorid, customDimensions.index, customDimensions.value
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
where customDimensions.index in (2,3)
group by fullvisitorid, customDimensions.index, customDimensions.value

It gives results like:

+---------------+------------------------+------------------------+
| fullvisitorid | customDimensions_index | customDimensions_value |
+---------------+------------------------+------------------------+
|             1 |                      2 | Bronze                 |
|             1 |                      3 | Yes                    |
|             2 |                      2 | Bronze                 |
|             2 |                      3 | No                     |
|             3 |                      2 | Bronze                 |
|             3 |                      3 | Yes                    |
|             4 |                      2 | Platinum               |
|             4 |                      3 | Yes                    |
+---------------+------------------------+------------------------+

I would like the values transposed, where customDimension_index 2 is color, and customDimension_value 3 is yesno, so results would look like this instead:

+---------------+----------+-------+
| fullvisitorid |  color   | yesno |
+---------------+----------+-------+
|             1 | Bronze   | Yes   |
|             2 | Bronze   | No    |
|             3 | Bronze   | Yes   |
|             4 | Platinum | Yes   |
+---------------+----------+-------+

I could pull one then the other separately and join on fullvisitorid, but hoping to be able to pull out data this way in a single step. Thanks!

Upvotes: 3

Views: 2782

Answers (2)

Felipe Hoffa
Felipe Hoffa

Reputation: 59225

2020-01 update: Update for #standard SQL

SELECT
  fullvisitorid,
  (SELECT value FROM UNNEST(customDimensions) WHERE index=2) color, 
  (SELECT value FROM UNNEST(customDimensions) WHERE index=3) yesno, 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`

Previously:

Mosha's answer is correct, but I wanted to add this one as it takes advantage of the nested nature of the GA records:

SELECT
  fullvisitorid,
  FIRST(IF(customDimensions.index=2, customDimensions.value, NULL)) WITHIN RECORD color, 
  FIRST(IF(customDimensions.index=3, customDimensions.value, NULL)) WITHIN RECORD yesno
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE customDimensions.index in (2,3)

Why: Instead of running a GROUP BY (which consumes resources as it has to look and group by any record that might have the same customerid), WITHIN RECORD looks only inside the individual rows.

If a customerid has more than one row (for example, they visited once Bronze/Yes and afterwards with Platinum/No), the results will emit every row and combination, instead of only the first one.

Upvotes: 5

Mosha Pasumansky
Mosha Pasumansky

Reputation: 14014

Here is the solution:

SELECT
  fullvisitorid,
  FIRST(IF(customDimensions.index=2, customDimensions.value, NULL)) color, 
  FIRST(IF(customDimensions.index=3, customDimensions.value, NULL)) yesno
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
where customDimensions.index in (2,3)
group by fullvisitorid 

It relies on the fact that any aggregation function, including FIRST, ignores NULLs

Upvotes: 3

Related Questions