Mark
Mark

Reputation: 172

BigQuery Timeout Errors in R Loop Using bigrquery

I am running a query in a loop for each store in a dataframe. Typically there are 70 or so stores so the loop repeats that many times for each complete loop.

Maybe 75% of the time this loop works all the way through with no errors.

About 25% of the time I get the following error during any one of the loop iterations:

Error in curl::curl_fetch_memory(url, handle = handle) : Timeout was reached

Then I have to figure out which iteration bombed, and repeat the loop excluding iterations that completed successfully.

I can't find anything on the web to help me understand what is causing this seemingly random error. Perhaps it is a BQ technical issue? There does not seem to be any relation to the size of the result set it crashes on.

Here is the part of my code that does the loop...again it works all the way through most of the time. The cartesian product across IDs is intentional, as I want every combination of each Test ID with all possible Control IDs within store.

sql<-"SELECT pstore as store, max(pretrips) as pretrips FROM analytics.campaign_ids 
  group by 1 order by 1"

store_maxtrips<-query_exec(sql,project=project, max_pages = 1)
store_maxtrips

for (i in 1:length(store_maxtrips$store)) {

  #pull back all ids shopping in same primary store as each test ID with their pre metrics  
  sql<-paste("SELECT a.pstore as pstore, a.id as test_id, 
  b.id as ctl_id,
    (abs(a.zpbsales-b.zpbsales)*",wt_pb_sales,")+(abs(a.zcatsales-b.zcatsales)*",wt_cat_sales,")+
  (abs(a.zsales-b.zsales)*",wt_retail_sales,")+(abs(a.ztrips-b.ztrips)*",wt_retail_trips,") as zscore
  FROM analytics.campaign_ids a inner join analytics.pre_zscores b
  on a.pstore=b.pstore
  where a.id<>b.id and a.pstore=",store_maxtrips$store[i]," order by a.pstore, a.id, zscore")

  print(paste("processing store",store_maxtrips$store[i]))

  query_exec(sql,project=project,destination_table = "analytics.campaign_matches",
         write_disposition = "WRITE_APPEND", max_pages = 1)
  }

Upvotes: 1

Views: 540

Answers (1)

Mark
Mark

Reputation: 172

Solved!

It turns out I was using query_exec, but I should have been using insert_query_job since I do not want to retrieve any results. The errors were all happening in the course of R trying to retrieve results from BigQuery which I didn't want anyhow.

By using insert_query_job + wait_for(job) in my loop instead of the query_exec command, it eliminated all issues with the loop finishing.

I did also need to add a try() function to help circumvent some rare errors that still popped up with this approach. Thanks to MarkeD for this tip. So my final solution looked like this:

try(job<-insert_query_job(sql,project=project,destination_table = "analytics.campaign_matches", write_disposition = "WRITE_APPEND")) wait_for(job)

Thanks to everyone who commented and helped me research the issue.

Upvotes: 2

Related Questions