Jeroen
Jeroen

Reputation: 49

How to specify a timeout when streaming rows into BigQuery?

Basically I want to stream some non-critical data into BigQuery during a very critical part of a system.

I want to specify a max timeout of about 2 seconds because I don't want to be blocking the process too long if there are any connectivity issues, or if BigQuery is not available (which has happened before, although I don't expect it to happen often).

I'm using the google/cloud library to connect to BigQuery, and am basically using the code found here: https://cloud.google.com/bigquery/streaming-data-into-bigquery

use Google\Cloud\BigQuery\BigQueryClient;

/**
 * Stream a row of data into your BigQuery table
 * Example:
 * ```
 * $data = [
 *     "field1" => "value1",
 *     "field2" => "value2",
 * ];
 * stream_row($projectId, $datasetId, $tableId, $data);
 * ```.
 *
 * @param string $projectId The Google project ID.
 * @param string $datasetId The BigQuery dataset ID.
 * @param string $tableId   The BigQuery table ID.
 * @param string $data      An associative array representing a row of data.
 * @param string $insertId  An optional unique ID to guarantee data consistency.
 */
function stream_row($projectId, $datasetId, $tableId, $data, $insertId = null)
{
    // instantiate the bigquery table service
    $bigQuery = new BigQueryClient([
        'projectId' => $projectId,
    ]);
    $dataset = $bigQuery->dataset($datasetId);
    $table = $dataset->table($tableId);

    $insertResponse = $table->insertRows([
        ['insertId' => $insertId, 'data' => $data],
        // additional rows can go here
    ]);
    if ($insertResponse->isSuccessful()) {
        print('Data streamed into BigQuery successfully' . PHP_EOL);
    } else {
        foreach ($insertResponse->failedRows() as $row) {
            foreach ($row['errors'] as $error) {
                printf('%s: %s' . PHP_EOL, $error['reason'], $error['message']);
            }
        }
    }
}

I believe their library uses Guzzle as a http client, but I don't know how to pass along that I want a timeout to occur after a set time.

Upvotes: 2

Views: 1006

Answers (2)

Jeroen
Jeroen

Reputation: 49

It was a little unclear to me at first, but you can just pass along options to the Guzzle http handler using the httpOptions option, including a timeout.

In the code snippet above, you would modify the $table->insertRows() statement as such:

$insertResponse = $table->insertRows([
    ['insertId' => $insertId, 'data' => $data],
    // additional rows can go here
], ['httpOptions' => ['timeout' => $timeoutInSeconds]]);

In there you can specify any request options as listed here: http://docs.guzzlephp.org/en/stable/request-options.html

Nevertheless, Felipe's answer is probably still the better advice.

Upvotes: 1

Felipe Hoffa
Felipe Hoffa

Reputation: 59165

Best recommendation I can give you: Don't stream directly to BigQuery from a process you don't want to block. Set up a service in the middle that can take care of handling timeouts and retries, while leaving your main process unblocked.

Some options are:

You can see some architectural reasons for this on the stories Shine posted 3 years ago: https://shinesolutions.com/2014/08/25/put-on-your-streaming-shoes/ and https://shinesolutions.com/2014/12/19/license-to-queue/.

Upvotes: 2

Related Questions