Reputation: 1
I could really use some assistance!
I've set up a pipeline to copy data from my managed SQL (PostgreSQL) on GCP to BigQuery.
I followed these guides:
Google Cloud Datastream Documentation
Above guide mentions that PG to BQ is supported. I successfully created a Dataflow template as outlined here: Datastream to BigQuery Template
The pipeline requires two BigQuery datasets: the main dataset and a staging dataset. After getting the pipeline running, I started seeing data being ingested into the staging dataset. The pipeline auto-created tables and columns and added the metadata* fields, but it did not add the _metadata_uuid and _metadata_lsn fields, which is causing the merge step to fail.
I checked the data in the GCS bucket that holds the delta files, and I can see the LSN field present in the metadata JSON.
Does anyone know why the _metadata_uuid and _metadata_lsn fields were not auto-created in the BigQuery datasets? Any help would be greatly appreciated!
Thanks in advance!
I was able to get merge step to work by manually adding _metadata_uuid
and _metadata_lsn
but this feels wrong. Especially if I have dozens of tables.
The JSON payloads in bucket have the following structure that contains LSN. This makes me thing that the actual dataflow (which I used of the template) is not correctly setup.
{
"uuid": "ffaa95d1-71d2-4574-97f6-c65e00000101",
"read_timestamp": "2024-11-20T01:23:48.963000Z",
"source_timestamp": "2024-11-20T01:23:47.615000Z",
"object": "public_appointments",
"read_method": "postgresql-cdc",
"stream_name": "projects/[HIDDEN_FOR_STACKOVER]/locations/us-west1/streams/postgres-stream",
"schema_key": "68c2081e17f1eed002116c41312188c642bd003b",
"sort_keys":
[
1732065827615,
"C4/390118C0"
],
"source_metadata":
{
"schema": "public",
"table": "appointments",
"is_deleted": false,
"change_type": "UPDATE",
"tx_id": 3088657,
"lsn": "C4/390118C0",
"primary_keys":
[
"id",
"tenant_id"
]
},
"payload":
{
(actual data)
}
}
The _metadata fields that were auto created are:
"_metadata_timestamp",
"_metadata_read_timestamp",
"_metadata_read_method",
"_metadata_source_type",
"_metadata_deleted",
"_metadata_change_type",
"_metadata_row_id",
"_metadata_scn",
"_metadata_ssn",
"_metadata_rs_id"
And we can see that dataflow job created _metadata_scn
but this is an Oracle field!
Upvotes: 0
Views: 24