Reputation: 567
We have a pipeline set to receive data from Apache Alooma to Bigquery (partitioned tables based on _PARTITIONTIME).
Observation so far is data is not directly dumped into _PARTITIONTIME column rather it first goes into 'null' partition and then gradually dumped into _PARTITIONTIME column .
example:
Query 1:
select DATE(_PARTITIONTIME) , count(1) from `<table>` where
FORMAT_DATETIME( "%Y-%m-%d %T" , <col1>) >= "2019-05-07 00:02:00" and FORMAT_DATETIME( "%Y-%m-%d %T",<col1>)<= "2019-05-07 23:59:59"
group by 1
Output :
1 null 21492
2 2019-05-07 6633
around 3 minutes reran the query and found the output as this
1 null 26355
2 2019-05-07 6633
Note -- Pulled record with a unique key from this partition and ran Query 2.
after 10-15 minutes again reran the query with the below results:
1 2019-05-07 28125
2 null 16660
To support the observation ran below query
Query 2:
select * from `<table>` where unique_col >= '5cd0d13d0ae0' and
DATE(_PARTITIONTIME) = '2019-05-07'
--0 record
select * from `<table>` where unique_col >= '5cd0d13d0ae0' and
_PARTITIONTIME is null
--output appeared
record found in _PARTITIONTIME column after 4 min
select * from `<table>` where unique_col >= '5cd0d13d0ae0' and
DATE(_PARTITIONTIME) = '2019-05-07'
--output appeared
As per the above observation, seems the data first gets into the temporary null partition and from null to actual partition column. Is this understanding correct?
Upvotes: 0
Views: 96
Reputation: 3642
seems the data first gets into temporary null partition and from null to actual partition column . Is this understanding correct ?
Yes it's. As can be seen from BigQuery documentation when streaming data into partition table the data is 1st stored in a streaming buffer with NULL value in the partition field. Once BigQuery finish inserting the record the _PARTITIONTIME field will hold the relevant date value
This is the relevant text from BigQuery documentation
Newly arriving data will be temporarily associated with the UNPARTITIONED partition while in the streaming buffer. A query can therefore exclude data in the streaming buffer from a query by filtering out the NULL values from the UNPARTITIONED partition by using one of the pseudocolumns
Upvotes: 2