jeremieca
jeremieca

Reputation: 1188

Stream a large Big-query SELECT with Node?

I'm developing a Node.js program. We use Node to manipulate data every day that are store in Big Query.

Each day we have a high volume of new data (280 Go).

How to make a request on BigQuery on all the day and stream the result row after row ?

Now, we don't have stream. We just request all the data once.

I could use the sql LIMIT keyword. But the problem is that BigQuery ignore the LIMIT in cost calculation. If we LIMIT 0,10. It explores all the data of the day (280 Go). Idem for LIMIT 10,10 ...

This is my current code.

    const BigQuery = require('@google-cloud/bigquery');

    // ... Some code ...

    this.bigQuery
        .query(Exporter.enrichQueryWithOptions(`SELECT e.name FROM  events))
        .then(results => {
            const rows = results[0];
            console.log(rows);
        })
        .catch(err => {
            console.error('ERROR:', err);
        });

Upvotes: 4

Views: 4495

Answers (3)

luispablo
luispablo

Reputation: 93

I think this might be what you need:

https://googleapis.dev/nodejs/bigquery/latest/BigQuery.html#createQueryStream

That function allows you to build a query and consume it through a stream of data.

Upvotes: 3

jeremieca
jeremieca

Reputation: 1188

Finally, I just used BigQuery Legacy SQL decorators to select only the time interval I need. So, I can't get a part of my large table and pay only for this part.

https://cloud.google.com/bigquery/table-decorators

But note that you can use decorator only on the last 7 days of data !

Upvotes: 0

Y Y
Y Y

Reputation: 111

As people pointed out, it is best if you can process everything in Bigquery SQL statement.

However, if you have to process the data in your application. Bigquery provides a tabledata.list API to read data from table directly.

https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/list

Upvotes: 0

Related Questions