Reputation: 49
Basically I want to stream some non-critical data into BigQuery during a very critical part of a system.
I want to specify a max timeout of about 2 seconds because I don't want to be blocking the process too long if there are any connectivity issues, or if BigQuery is not available (which has happened before, although I don't expect it to happen often).
I'm using the google/cloud
library to connect to BigQuery, and am basically using the code found here: https://cloud.google.com/bigquery/streaming-data-into-bigquery
use Google\Cloud\BigQuery\BigQueryClient;
/**
* Stream a row of data into your BigQuery table
* Example:
* ```
* $data = [
* "field1" => "value1",
* "field2" => "value2",
* ];
* stream_row($projectId, $datasetId, $tableId, $data);
* ```.
*
* @param string $projectId The Google project ID.
* @param string $datasetId The BigQuery dataset ID.
* @param string $tableId The BigQuery table ID.
* @param string $data An associative array representing a row of data.
* @param string $insertId An optional unique ID to guarantee data consistency.
*/
function stream_row($projectId, $datasetId, $tableId, $data, $insertId = null)
{
// instantiate the bigquery table service
$bigQuery = new BigQueryClient([
'projectId' => $projectId,
]);
$dataset = $bigQuery->dataset($datasetId);
$table = $dataset->table($tableId);
$insertResponse = $table->insertRows([
['insertId' => $insertId, 'data' => $data],
// additional rows can go here
]);
if ($insertResponse->isSuccessful()) {
print('Data streamed into BigQuery successfully' . PHP_EOL);
} else {
foreach ($insertResponse->failedRows() as $row) {
foreach ($row['errors'] as $error) {
printf('%s: %s' . PHP_EOL, $error['reason'], $error['message']);
}
}
}
}
I believe their library uses Guzzle as a http client, but I don't know how to pass along that I want a timeout to occur after a set time.
Upvotes: 2
Views: 1006
Reputation: 49
It was a little unclear to me at first, but you can just pass along options to the Guzzle http handler using the httpOptions
option, including a timeout.
In the code snippet above, you would modify the $table->insertRows()
statement as such:
$insertResponse = $table->insertRows([
['insertId' => $insertId, 'data' => $data],
// additional rows can go here
], ['httpOptions' => ['timeout' => $timeoutInSeconds]]);
In there you can specify any request options as listed here: http://docs.guzzlephp.org/en/stable/request-options.html
Nevertheless, Felipe's answer is probably still the better advice.
Upvotes: 1
Reputation: 59165
Best recommendation I can give you: Don't stream directly to BigQuery from a process you don't want to block. Set up a service in the middle that can take care of handling timeouts and retries, while leaving your main process unblocked.
Some options are:
You can see some architectural reasons for this on the stories Shine posted 3 years ago: https://shinesolutions.com/2014/08/25/put-on-your-streaming-shoes/ and https://shinesolutions.com/2014/12/19/license-to-queue/.
Upvotes: 2