Reputation: 710
I'm using Airflow to extract BigQuery rows to Google Cloud Storage in Avro format.
with models.DAG(
"bigquery_to_bigtable",
default_args=default_args,
schedule_interval=None,
start_date=datetime.now(),
catchup=False,
tags=["test"],
) as dag:
data_to_gcs = BigQueryInsertJobOperator(
task_id="data_to_gcs",
project_id=project_id,
location=location,
configuration={
"extract": {
"destinationUri": gcs_uri, "destinationFormat": "AVRO",
"sourceTable": {
"projectId": project_id, "datasetId": dataset_id,
"tableId": table_id}}})
gcs_to_bt = DataflowTemplatedJobStartOperator(
task_id="gcs_to_bt",
template="gs://dataflow-templates/latest/GCS_Avro_to_Cloud_Bigtable",
location=location,
parameters={
'bigtableProjectId': project_id,
'bigtableInstanceId': bt_instance_id,
'bigtableTableId': bt_table_id,
'inputFilePattern': 'gs://export/test.avro-*'
},
)
data_to_gcs >> gcs_to_bt
the bigquery row contains
row_key | 1_cnt | 2_cnt | 3_cnt
1#2021-08-03 | 1 | 2 | 2
2#2021-08-02 | 5 | 1 | 5
.
.
.
I'd like to use the row_key
column for row key in bigtable and rest column for columns in specific column family like my_cf
in bigtable.
However I got error messages while using dataflow to loads avro file to bigtable
"java.io.IOException: Failed to start reading from source: gs://export/test.avro-"
Caused by: org.apache.avro.AvroTypeException: Found Root, expecting com.google.cloud.teleport.bigtable.BigtableRow, missing required field key
The docs I read is telling:
The Bigtable table must exist and have the same column families as exported in the Avro files.
How Do I export BigQuery in Avro with same column families?
Upvotes: 1
Views: 760
Reputation: 4262
I think you have to transform AVRO to proper schema. Documentation mentioned by you also says:
- Bigtable expects a specific schema from the input Avro files.
There is a link that is referring to special data schema, which has to be used.
If I understand correctly you are just importing data from table the result, although is AVRO schema, will not much the requirement schema, so you need to transform data to proper schema appropriate to your BigTable schema.
Upvotes: 1