Reputation: 173
I've succesfully managed to duplicate a DB to another one with a Kafka Cluster that is connected to the Source DB via the Debezium Mysql Connector while the Sink database gets written by the Debezium JDBC Connector.
So far I've been able to fully replicate the database in INSERT mode, but of course that leads to a duplication of record anytime I reset the Kafka Cluster, so I'd like to configure the Sink Connector in upsert mode.
The tricky part is that the table is somewhat large, therefore it would be impractical to manually define the field containing ids, and not every table has a key defined.
So this are the errors I get according to which "primary.key.mode" is set:
So am I out of luck and shouldn't be expecting Debezium to smartly handle these kind of scenarios by itself?
Upvotes: 0
Views: 1071
Reputation: 21133
record.key: when the connector reaches a table without a defined id it crashes as the messages of the keys in the table's topic are null
It's not expected behavior for any JDBC sink connector to consume from multiple topics where the data set may have inconsistent key structures that differ from the connector configuration. It's expected when you have this situation, you set up 1 sink connector where you would have records in the topic that have keys using record_key
and you'd use a completely separate connector where you may have a keyless table where you perhaps use record_value
to define a surrogate key set-up for the target table.
kafka: it kinda works but that leads to a creation of three additional colums in the source tables, not desirable
With schema.evolution
set to none
, it's expected that you or your DBA are responsible for making sure the target table's structure aligns with the data set.
record_value: the connector crashes immediately asking me to define the primary.key.field. Maybe I've misunderstood the official documentation but I'm not expecting this kind of behaviour.
You should be able to use record_value
in this context and that should mean that all columns in the value's Struct
are considered primary key fields; however, you must be careful with this set-up particularly if you have specific data types in your value fields that may not be valid for your target database for primary keys, such as TEXT
or LOB columns, etc.
If you are facing an issue with record_value
and you have simply VARCHAR and NUMERIC columns and getting told you need to specify primary.key.fields
, then I would say that's a regression and please open a Jira so we can get that rectified.
Upvotes: 1