Brandon
Brandon

Reputation: 37

BigQuery select statement within table_date_range cannot find schema

I have created a summary table in BigQuery that has summary data for each day (Eg. 20151127). What I'd like to do is to take a look at the last summary data that I have and use that as my starting point for new data to search for. For example, if the last date that I have is 2015-11-27, I want to add a day to it and then search for GA data from 2015-11-28 until today.

The way that I was hoping to do this is with the table_date_range function by supplying a select. Eg.

select
*
from
FROM
    TABLE_DATE_RANGE(
        [id::projectId.ga_sessions_]
        ,(
            select 
                date_add(
                    timestamp(max(session_date))
                    , 1
                    , 'DAY'
                )
            from
                [id::projectId.daily_pageview_data]
        )
        ,CURRENT_TIMESTAMP()
    )

I'm not able to get this select statement to work with BigQuery. Does anyone know a way to query like this? I'm trying to do this because I want to refresh my summary table each day, but I don't want to have to re-query all of the data each time I run the script.

Upvotes: 2

Views: 210

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

You will need one luxury table - calendar_dates with only dates in it, like below

2015-11-24   
2015-11-25   
2015-11-26   
2015-11-27   
2015-11-28   
2015-11-29   
2015-11-30   
2015-12-01   
2015-12-02   
2015-12-03   
2015-12-04  

Than, below query does what you asked

SELECT 
  *
FROM (
  TABLE_QUERY(YourDataset,
    'table_id IN (
        SELECT 
          "ga_sessions_" + REPLACE(DATE(DATE_ADD(TIMESTAMP(DAY), 1, 'DAY')), '-', '')
        FROM temp.calendar_dates AS days
        CROSS JOIN (
          SELECT MAX(session_date) max_date
          FROM YourDataset.daily_pageview_data
        ) max_avail_day
        WHERE DAY BETWEEN max_date AND CURRENT_DATE()    
    )'
  )
)

To create luxury table - calendar_dates - you can use below script with :var_start and :var_end set to reflect your needs. if for some reason you dont want to have extra table - you can embed below code into above one in place of temp.calendar_dates reference

SELECT DATE(DATE_ADD(TIMESTAMP(:var_start), pos - 1, "DAY")) as day
FROM (
     SELECT ROW_NUMBER() OVER() AS pos, *
     FROM (FLATTEN((
     SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(:var_end), TIMESTAMP(:var_start)), '.'),'') AS h
     FROM (SELECT NULL)),h
)))

Upvotes: 2

Related Questions