Reputation: 1
I'm using BigQuery Java API to query and my query looking like
select * from <TABLE> where Hour >= timestamp('2020-05-01 00:00:00') and Hour <= timestamp('2020-05-02 00:00:00') and <COLUMN> IN (select <COLUMN> from <OTHER_TABLE> limit 1028) limit 1
I've observed that not all tasks are completed when the job is marked as done as shown below.
"statementType": "SELECT",
"timeline": [
{
"activeUnits": "1348",
"completedUnits": "245",
"elapsedMs": "953",
"pendingUnits": "13270",
"totalSlotMs": "11681"
},
{
"activeUnits": "1330",
"completedUnits": "246",
"elapsedMs": "1053",
"pendingUnits": "13269",
"totalSlotMs": "15647"
}
],
"totalBytesBilled": "46137344",
"totalBytesProcessed": "45657839",
"totalPartitionsProcessed": "2",
"totalSlotMs": "15647"
I generally see 0 pending Units for most jobs on completion and would expect it to be 0.
Are these the tasks that got skipped by any chance, maybe because of the limit (my guess)? If that's the case should't there be a skippedUnits?
Upvotes: 0
Views: 181
Reputation: 4384
Yes, something like an LIMIT clause on an unordered set of rows is an example where not all possible work units need to be completed to satisfy the query stage. Looking at the query stage statistics rather than the timeline is a better place to understand where these are coming from as they'll be correlated with a specific execution stage.
Timeline is simply a series of snapshots estimating work state at a given moment. It's not as concerned with qualifying how an individual unit of work transitioned.
Upvotes: 2