hmims
hmims

Reputation: 539

Querying in Google bigQuery work slow

I have a table which contains nearly a million rows. Searching for a single value in it takes 5 sec and around 500 in 15 seconds. This is quite a long time. Please let me know how can I optimize the query?

My query is:

select a,b,c,d from table where a in ('a1','a2')

Job id : stable-apogee-119006:job_ClLDIUSdDLYA6tC2jfC5GxBXmv0

Upvotes: 2

Views: 1616

Answers (1)

Willian Fuks
Willian Fuks

Reputation: 11787

I'm not sure what you mean by "500 it takes 15 secs" but I ran some tests against our database trying to simulate what you are running and I got some similar results to yours

(my query is slower then yours as it has a join operation but still here we go):

SELECT
  a.fv fv,
  a.v v,
  a.sku sku,
  a.pp pp from(
  SELECT
    fullvisitorid fv,
    visitid v,
    hits.product.productsku sku,
    hits.page.pagepath pp
  FROM (TABLE_DATE_RANGE([40663402.ga_sessions_], DATE_ADD(CURRENT_DATE(), -3, 'day'), DATE_ADD(CURRENT_DATE(), -3, 'day')))
  WHERE
    1 = 1 ) a
JOIN EACH (
  SELECT
    fullvisitorid fv,
  FROM (TABLE_DATE_RANGE([40663402.ga_sessions_], DATE_ADD(CURRENT_DATE(), -3, 'day'), DATE_ADD(CURRENT_DATE(), -3, 'day')))
  GROUP EACH BY
    fv
  LIMIT
    1 ) b
ON
  a.fv = b.fv

Querying for just one day and bringing just one fullvisitor took BQ roughly 5 secs to process 1.7 GBs.

And when I ran the same query for the last month and removed the limit operator it took ~10s to process ~56GB of data (around 33 million rows):

enter image description here

This is insanely fast.

So you might have to evaluate your project specs. If 5 secs is still too much for you then maybe you'll need to find some other strategy in your architecture that suits you best.

BigQuery does consume seconds to process its demands but it's also ready to process hundreds of Gigas still in seconds.

If your project data consumption is expected to grow and you will start processing millions of rows then you might evaluate if waiting a few secs is still acceptable in your application.

Other than that, as far as your query goes, I don't think there's much optimization left to improve its performance.

(ps: I decided to run for 100 days and it processed around 100 GBs in 14s.)

Upvotes: 2

Related Questions