Loading data into Biqquery Partitioned table through Google Dataflow/Beam with write_truncate

Question

So the existing setup we had use to create a new table for each day, which worked fine with "WRITE_TRUNCATE" option, however when we updated our code to use partitioned table, though our dataflow job, it wouldn`t work with write_truncate.

It works perfectly fine, with write disposition set as "WRITE_APPEND" (From what i understood, from beam, it maybe tries to delete the table, and then recreate it), since i`m supplying the table decorator it fails to create a new table.

Sample snippet using python code:

beam.io.Write('Write({})'.format(date), beam.io.BigQuerySink(output_table_name + '$' + date, create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER, write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)

which gives the error:

Table IDs must be alphanumeric

since it tried to recreate the table, and we supply the partition decorator in the argument.

Here are some of the things that i`v tried:

Updating the write_disposition as WRITE_APPEND, although it works, it fails the purpose, since running for the same date again would duplicate data.
Using

bq --apilog /tmp/log.txt load --replace --source_format=NEWLINE_DELIMITED_JSON 'table.$20160101' sample_json.json

command, to see if i can observe any logs, on how does truncate actually works, based on the link that i found.

Tried some other links, but this as well uses WRITE_APPEND.

Is there a way to write to a partitioned table, from a dataflow job using write_truncate method?

Let me know if any additional details are required. Thanks

Nicholas · Accepted Answer

Seems like this is not supported at this time. Credit goes to @Pablo for finding out from the IO dev.

According to the Beam documentation on the Github page, their JIRA page would be the appropriate to request such a feature. I'd recommend filing a feature request there and posting a link in a comment here so that others in the community can follow through and show their support.

Loading data into Biqquery Partitioned table through Google Dataflow/Beam with write_truncate

Answers (1)

Related Questions