Justine
Justine

Reputation: 115

Querying nested data in bigquery

I'm trying to extract all columns in bigquery google analytics data WHERE I filtered by hostname. However, the result is still showing the other hostnames as seen on screenshot. I'm doing this select with a filter to copy data to another dataset with the same schema. enter image description here

QUERY BELOW:

SELECT g.* FROM `10241241.ga_sessions_20160805` g, UNNEST (hits) as hits WHERE hits.page.hostname= "www.googlemerchandisestore.com"

UPDATED How can I copy the same schema without changing its nested structure?

error in nested schema

Upvotes: 0

Views: 1686

Answers (2)

Martin Weitzmann
Martin Weitzmann

Reputation: 4736

The values you want to filter for are sitting in an array ... so you can filter/clean the array or exclude rows with arrays not containing that value or exclude rows that contain not only that value. Not sure which applies for you, since you don't seem to have a reason to cross join the array with the whole table (which bloats it up tremendously).

Cleaning the array:

SELECT 
  * 
  REPLACE( 
    ARRAY(SELECT AS STRUCT * FROM g.hits WHERE page.hostname='www.googlemerchandisestore.com') AS hits
    ) 
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20160805` g
limit 1000

Exclude rows that don't contain the value

SELECT 
  * 
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20160805` g
WHERE
  (SELECT COUNT(1)>0 FROM g.hits WHERE page.hostname='www.googlemerchandisestore.com')
limit 1000

etc.

Upvotes: 2

Sabri Karagönen
Sabri Karagönen

Reputation: 2365

You shouldn't add hits column again. Also, I'd recommend you to use a different name for unnested column. It'll be easier to understand, then.

SELECT g.* EXCEPT(hits), hit 
FROM `10241241.ga_sessions_20160805` g, UNNEST (hits) as hit 
WHERE hits.page.hostname= "www.googlemerchandisestore.com"

Upvotes: 0

Related Questions