Stéphane Loison
Stéphane Loison

Reputation: 33

Beam direct-runner slow BigQuery read

I have a very simple process which first step is to read a BigQuery Table

p.apply("BigQuery data load", BigQueryIO.read().usingStandardSql().fromQuery(BG_SELECT).withoutValidation().withoutResultFlattening())

This step take about 2/3 minutes to perform (about 1000 lines retreived) ! When I look BigQuery I see multiple lines linked to my query

10:54:37.703 BigQuery delete temp_
10:54:37.244 BigQuery delete temp_
10:54:35.492 BigQuery jobcompleted
10:54:34.802 BigQuery insert jobs 
10:54:22.081 BigQuery jobcompleted
10:52:33.812 BigQuery insert jobs 
10:52:33.106 BigQuery insert datas
10:52:32.410 BigQuery insert jobs 

This 2 minutes for job completion is normal ? (I have no parallel activity on bigquery)

How can I have better (normal !) performance ?

Upvotes: 2

Views: 437

Answers (1)

medvedev1088
medvedev1088

Reputation: 3745

By default BigQueryIO uses BATCH priority. Batch mode queries are queued by BigQuery. These are started as soon as idle resources are available, usually within a few minutes.

You can explicitly set the priority to INTERACTIVE.

p.apply("BigQuery data load", BigQueryIO.readTableRows()
    .withQueryPriority(BigQueryIO.TypedRead.QueryPriority.INTERACTIVE)
    .usingStandardSql()
    .fromQuery(BG_SELECT)
    .withoutValidation()
    .withoutResultFlattening())

Interactive mode allows for BigQuery to execute the query as soon as possible.

Upvotes: 1

Related Questions