Reputation: 83
I'm using Bigquery Java API to run ~1000 copy jobs simultaneously (With scala.concurrent.Future) with WriteDisposition WRITE_APPEND, but I'm getting
com.google.cloud.bigquery.BigQueryException: API limit exceeded: Unable to return a row that exceeds the API limits. To retrieve the row, export the table
I thought this is caused by too much concurrency, then I tried to use Monix's Task to limit the parallelism to at most 20:
def execute(queries: List[Query]): Future[Seq[Boolean]] = {
val tasks: Iterator[Task[List[Boolean]]] = queries.map(q => BqApi.copyTable(q, destinationTable))
.sliding(20, 20)
.map(Task.gather(_))
val results: Task[List[Boolean]] = Task.sequence(tasks)
.map(_.flatten.toList)
results.runAsync
}
where BqApi.copyTable executes the query and copy the result to the destination table then returns a Task[Boolean].
The same exception still happens.
But if I change the WriteDisposition to WRITE_TRUNCATE, the exception goes away.
Can anyone help me to understand what happens under the hood? And why Bigquery API behaves like this?
Upvotes: 0
Views: 764
Reputation: 2893
This message is encountered when a query exceeds a maximum response size. Since copy jobs use jobs.insert
, maybe you're hitting the maximum row size which are in the query jobs limits. I suggest filling a BigQuery bug on its issue tracker to describe your behavior properly regarding the Java API.
Upvotes: 1