user912830823
user912830823

Reputation: 1352

Bigquery resources exceeded during query execution when using analytic function

In my BQ standardsql query when i use few analytic functions ( https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#analytic-functions ) i get this error:

Resources exceeded during query execution. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors

The query has few fields calculated more or less similar to this:

case when 1 = ROW_NUMBER() over (partition by Y,m,operatingSystem)
then count(distinct case when IsNewVisit = 1 then fullvisitorid else null end)
over (partition by Y,m,operatingSystem)
else null end as NewUniqueVisitorsMonthlyOS

When i split the query and run each part one by one they all work well. However, I don't want to split the query into multiple ones because i need to have all the fields in one final BQ View.

Is there any way i can fix this error?

UPD: Here is an example of the query. When i add more fields it stops working with the above mentioned error.

SELECT 
distinct
Date
,channelGrouping
,country
,browser
,deviceCategory
,operatingSystem

#Visits by all dimensions
,count(distinct concat(fullvisitorid,cast(visitid as string))) 
over (partition by concat(Y,m,d),channelGrouping,country,browser,deviceCategory,operatingSystem)
as Visits 

#Daily Users Browser
,case when 1 = ROW_NUMBER() over (partition by Y,m,d,browser)
then count (distinct fullvisitorid) 
over (partition by Y,m,d,browser)
else null end as UniqueVisitorsDailyBrowser


#Weekly New Users
,case when 1 = ROW_NUMBER() over (partition by Y,U,channelGrouping)
then count(distinct case when IsNewVisit = 1 then fullvisitorid else null end)
over (partition by Y,U,channelGrouping)
else null end as NewUniqueVisitorsWeeklyChannel

#Monthly New Users
,case when 1 = ROW_NUMBER() over (partition by Y,m,operatingSystem)
then count(distinct case when IsNewVisit = 1 then fullvisitorid else null end)
over (partition by Y,m,operatingSystem)
else null end as NewUniqueVisitorsMonthlyOS

FROM GA_Export_Schema

Upvotes: 0

Views: 681

Answers (1)

Willian Fuks
Willian Fuks

Reputation: 11797

What I usually do when I need to answer several different questions with a single query is I try to use some UNION ALL operation with some key to distinguish the data.

I tested this query in our dataset based on your query:

SELECT
  date date,
  country country,
  channel channel,
  browser browser,
  cat cat,
  os os,
  MAX(all_keys_visits) all_keys_visits,
  MAX(browser_visits) browser_visits,
  MAX(week_new_channel_users) week_new_channel_users,
  MAX(month_new_os_users) month_new_os_users from(
  SELECT
    date,
    country,
    channel,
    browser,
    cat,
    os,
    visits AS all_keys_visits,
    MAX(CASE
        WHEN tmp = 'browser' THEN visits END) OVER(PARTITION BY browser) browser_visits,  MAX(CASE
        WHEN tmp = 'weekly_new_users' THEN visits END) OVER(PARTITION BY channel, week_date) week_new_channel_users,
    MAX(CASE
        WHEN tmp = 'monthly_new_users' THEN visits END) OVER(PARTITION BY os, month_date) month_new_os_users from(
          SELECT
            tmp,
            date,
            week_date,
            month_date,
            country,
            channel,
            browser,
            cat,
            os,
            visits  FROM (
              SELECT
                'all_visitors' AS tmp,
                date,
                FORMAT_DATE("%W", parse_DATE('%Y%m%d',  date)) week_date,
                FORMAT_DATE("%m", parse_DATE('%Y%m%d',  date)) month_date,
                geonetwork.country country,
                channelGrouping channel,
                device.browser browser,
                device.devicecategory cat,
                device.operatingSystem os,
                COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitid AS string))) visits
              FROM `project_id.dataset_id.ga_sessions*`
              WHERE 1 = 1
              AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))  GROUP BY tmp,  date,  channel,  country,  browser,  cat,  os )
            UNION ALL (
              SELECT
               'browser' AS tmp,
                date,
                FORMAT_DATE("%W", parse_DATE('%Y%m%d',  date)) week_date,
                FORMAT_DATE("%m", parse_DATE('%Y%m%d',  date)) month_date,
                '' country,
                '' channel,
                device.browser browser,
                '' cat,
                '' os,
                COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitid AS string))) visits
                FROM `project_id.dataset_id.ga_sessions*`  WHERE 1 = 1 AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
                GROUP BY tmp,  date,  channel,  country,  browser,  cat,  os )
            UNION ALL (
              SELECT
               'weekly_new_users' AS tmp,
                date,
                FORMAT_DATE("%W", parse_DATE('%Y%m%d',  date)) week_date,
                FORMAT_DATE("%m", parse_DATE('%Y%m%d',  date)) month_date,
                '' country,
                channelGrouping channel,
                '' browser,
                '' cat,
                '' os,
                COUNT(DISTINCT CASE
                   WHEN totals.newVisits = 1 THEN CONCAT(fullvisitorid, CAST(visitid AS string)) END) visits
              FROM `project_id.dataset_id.ga_sessions*`
              WHERE
              1 = 1
              AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
              AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
            GROUP BY
              tmp,
              date,
              channel,
              country,
              browser,
              cat,
              os )
          UNION ALL (
            SELECT
              'monthly_new_users' AS tmp,
              date,
              FORMAT_DATE("%W", parse_DATE('%Y%m%d',
                  date)) week_date,
              FORMAT_DATE("%m", parse_DATE('%Y%m%d',
                  date)) month_date,
              '' country,
              '' channel,
              '' browser,
              '' cat,
              device.operatingSystem os,
              COUNT(DISTINCT
                CASE
                  WHEN totals.newVisits = 1 THEN CONCAT(fullvisitorid, CAST(visitid AS string)) END) visits
            FROM
              `project_id.dataset_id.ga_sessions*`
            WHERE
              1 = 1
              AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
              AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
            GROUP BY
              tmp,
              date,
              channel,
              country,
              browser,
              cat,
              os ) ) )
GROUP BY
  date,
  country,
  channel,
  browser,
  cat,
  os
HAVING
  (country != ''
    AND channel != ''
    AND browser != ''
    AND cat != ''
    AND os != '')

Basically on each UNION I create a key and later on I made an aggregation based on that key and the values you want to analyze. After that, I just removed the fields created as empty strings.

I tried processing 30 days of data which consumed a few gigas and still it completed in less then 20 seconds so it might work for you as well (notice that here by running the unions separately and then aggregating later you end up working in a much smaller data volume which avoids resource depletion).

Upvotes: 0

Related Questions