BigQuery - Issues inserting data

Question

I am currently working with the new Google PHP libraries for BigQuery, which is working really well, however I am having some issues with performance and I haven't been able to find anything about it.

I'm using the new insertRows method to send data to BigQuery and although I always receive the response "bigquery#tableDataInsertAllResponse", I hardly ever actually see my data get inserted.

Usually these datasets that I sent are about 30-100 items at a time, I looked up quotas for BigQuery, that I wasn't exceeding it and I wasn't even close to the limits.

BigQuery does enter some of the rows I sent eventually, but it takes incredibly long before BigQuery acknowledges that I sent data, and when it does, it is in the streaming buffer for days and eventually only ends up inserting some of the data.

This is the function I use to insert data:

public function insertInto($tableName, $datasetName, $arrayToInsert) {
    if (empty($arrayToInsert)) {
        return true;
    }
    $table = $this->getTable($tableName, $datasetName);
    $bqTable = $this->bigQueryIdController->getBigQueryTable($tableName, $datasetName);
    $arraySize = count($arrayToInsert);

    $currentSize = $bqTable->getSize();
    for ($i = 0; $i < $arraySize; ++$i) {
        $arrayToInsert[$i]['insertId'] = $currentSize + 1 + $i;
    }

    $insertResponse = $table->insertRows($arrayToInsert);

    if (!$insertResponse->isSuccessful()) {
        $this->handleFailedRows($insertResponse->failedRows());
        return false;
    } else {
        $this->bigQueryIdController->updateBigQueryTable($bqTable, $arraySize);
        return true;
    }
}

Is there any way I can speed things up? I can't really use BigQuery right now because of how long it takes before my data is actually available.

Edit:

I have already found that rows can be dismissed on duplicate ID entries, however, this will not happen. Whenever I try to insert into a table, it will get the size of the table locally, not remotely, but a stored value in my database. In case it's not present, it will fetch the size remotely and store it locally. Whenever the insertion is successful, it adds the number of "inserted" items to the count and updates the local size. The value in my database is how many items should be present.

BigQuery - Issues inserting data

Answers (1)

Related Questions