Reputation: 1188
I'm developing a Node.js program. We use Node to manipulate data every day that are store in Big Query.
Each day we have a high volume of new data (280 Go).
How to make a request on BigQuery on all the day and stream the result row after row ?
Now, we don't have stream. We just request all the data once.
I could use the sql LIMIT keyword. But the problem is that BigQuery ignore the LIMIT in cost calculation. If we LIMIT 0,10. It explores all the data of the day (280 Go). Idem for LIMIT 10,10 ...
This is my current code.
const BigQuery = require('@google-cloud/bigquery');
// ... Some code ...
this.bigQuery
.query(Exporter.enrichQueryWithOptions(`SELECT e.name FROM events))
.then(results => {
const rows = results[0];
console.log(rows);
})
.catch(err => {
console.error('ERROR:', err);
});
Upvotes: 4
Views: 4495
Reputation: 93
I think this might be what you need:
https://googleapis.dev/nodejs/bigquery/latest/BigQuery.html#createQueryStream
That function allows you to build a query and consume it through a stream of data.
Upvotes: 3
Reputation: 1188
Finally, I just used BigQuery Legacy SQL decorators to select only the time interval I need. So, I can't get a part of my large table and pay only for this part.
https://cloud.google.com/bigquery/table-decorators
But note that you can use decorator only on the last 7 days of data !
Upvotes: 0
Reputation: 111
As people pointed out, it is best if you can process everything in Bigquery SQL statement.
However, if you have to process the data in your application. Bigquery provides a tabledata.list API to read data from table directly.
https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/list
Upvotes: 0