Reputation: 33
I have a very simple process which first step is to read a BigQuery Table
p.apply("BigQuery data load", BigQueryIO.read().usingStandardSql().fromQuery(BG_SELECT).withoutValidation().withoutResultFlattening())
This step take about 2/3 minutes to perform (about 1000 lines retreived) ! When I look BigQuery I see multiple lines linked to my query
10:54:37.703 BigQuery delete temp_
10:54:37.244 BigQuery delete temp_
10:54:35.492 BigQuery jobcompleted
10:54:34.802 BigQuery insert jobs
10:54:22.081 BigQuery jobcompleted
10:52:33.812 BigQuery insert jobs
10:52:33.106 BigQuery insert datas
10:52:32.410 BigQuery insert jobs
This 2 minutes for job completion is normal ? (I have no parallel activity on bigquery)
How can I have better (normal !) performance ?
Upvotes: 2
Views: 437
Reputation: 3745
By default BigQueryIO
uses BATCH
priority. Batch mode queries are queued by BigQuery. These are started as soon as idle resources are available, usually within a few minutes.
You can explicitly set the priority to INTERACTIVE
.
p.apply("BigQuery data load", BigQueryIO.readTableRows()
.withQueryPriority(BigQueryIO.TypedRead.QueryPriority.INTERACTIVE)
.usingStandardSql()
.fromQuery(BG_SELECT)
.withoutValidation()
.withoutResultFlattening())
Interactive mode allows for BigQuery to execute the query as soon as possible.
Upvotes: 1