Reputation: 269
I have a table with around 50,000,000 records. I would like to fetch one column of the whole table
SELECT id FROM `project.dataset.table`
Running this code in the Web Console takes around 80 seconds.
However when doing this with the Ruby Gem, I'm limited to fetch only 100,000 records per query. With the #next
method I can access the next 100,000 records.
require "google/cloud/bigquery"
@big_query = Google::Cloud::Bigquery.new(
project: "project",
keyfile: "keyfile"
)
@dataset = @big_query.dataset("dataset")
@table = @dataset.table("table")
queue = @big_query.query("SELECT id FROM `project.dataset.table`", max: 1_000_000)
stash = queue
loop do
queue = queue.next
unless queue
break
else
O.timed stash.size
stash += queue
end
end
The problem with this is that each request takes around 30 seconds. max: 1_000_000
is of no use, I'm stuck at 100,000. This way the query takes over 4 hours, which is not acceptable.
What am I doing wrong?
Upvotes: 1
Views: 940
Reputation: 207952
You should rather do an export job, this way you will have as file(s) on GCS. Then downloading from there is easy.
https://cloud.google.com/bigquery/docs/exporting-data
Ruby example here https://github.com/GoogleCloudPlatform/google-cloud-ruby/blob/master/google-cloud-bigquery/lib/google/cloud/bigquery.rb
Upvotes: 2