Reputation: 111
I'm trying to wrap my head around how Kafka Connect works and I can't understand one particular thing.
From what I have read and watched, I understand that Kafka Connect allows you to send data into Kafka using Source Connectors and read data from Kafka using Sink Connectors. And the great thing about this is that Kafka Connect somehow abstracts away all the platform-specific things and all you have to care about is having proper connectors. E.g. you can use a PostgreSQL Source Connector to write to Kafka and then use Elasticsearch and Neo4J Sink Connectors in parallel to read the data from Kafka.
My question is: how does this abstraction work? Why are Source and Sink connectors written by different people able to work together? In order to read data from Kafka and write them anywhere, you have to expect some fixed message structure/schema, right? E.g. how does an Elasticsearch Sink know in advance what kind of messages would a PostgreSQL Source produce? What if I replaced PostgreSQL Source with MySQL source? Would the produced messages have the same structure?
It would be logical to assume that Kafka requires some kind of a fixed message structure, but according to the documentation the SourceRecord
which is sent to Kafka does not necessarily have a fixed structure:
...can have arbitrary structure and should be represented using org.apache.kafka.connect.data objects (or primitive values). For example, a database connector might specify the
sourcePartition
as a record containing{ "db": "database_name", "table": "table_name"}
and thesourceOffset
as a Long containing the timestamp of the row".
Upvotes: 0
Views: 141
Reputation: 191904
In order to read data from Kafka and write them anywhere, you have to expect some fixed message structure/schema, right?
Exactly. Refer the Javadoc on the Struct and Schema classes of the Connect API as well as the Converter interface
Of course, those are not strict requirements, but without them, then the framework doesn't work across different sources and sinks, but this is no different than the contract between producers and consumers regarding serialization
Upvotes: 2