Kafka Connect - Sink connector - set use schema id in connector config

Question

I've been trying to find available connector configurations with Kafka Connect that will allow me produce payloads to the kafka topic without having to specify the payload schema construct or having to add the schema id. In other word, so far I've only been able to get a JDBC sink connector (kafka -> postgres) working from for the following 2 schenarios:

Json payloads need to be formatted together with a schema definition (without schema registry) -> blog post

from datetime import datetime
from confluent_kafka import Producer
import json
payload = {
    "schema": {
        "type": "struct", 
        "fields": [
            {
                "type": "string", 
                "field": "value"
            }, 
            {
                "type": "int64", 
                "field": "value_number"
            }, 
            {
                "type": "string", 
                "field": "timestamp"    
            }
        ],
        "optional": False,
        "name": "postgres-sink"
    },
    "payload": {
        "value": "example",
        "value_number": 5,
        "timestamp": str(datetime.utcnow())
    }
}
send = json.dumps(payload).encode('utf-8')
producer_conf = {
    #... connection details
}
producer = Producer(producer_conf)
producer.produce(topic='topic',value=send)
producer.flush()

Produced payloads requiring the schema ID (schema loaded onto schema registry) in the msg for the connector converter to be able to know how to deserialize the packet:

import struct
from io import BytesIO
import json
from confluent_kafka import Producer

class _ContextStringIO(BytesIO):
    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()
        return False

def serialize(content: dict, schema_id: int):
    with _ContextStringIO() as fp:
        fp.write(struct.pack('>bI', 0, schema_id))
        fp.write(json.dumps(content).encode('utf-8'))
        return fp.getvalue()

payload = serialize({
  "value": "example-ser",
  "value_number": i,
  "timestamp": str(datetime.utcnow())
  }, 2)
producer_conf = {
    #... connection details
}
producer = Producer(producer_conf)
producer.produce(topic='topic',value=payload)
producer.flush()

Even though the second approach is almost there, I require to handle kafka packets serialized as:

payload = json.dumps({
    "value": "example-2",
    "value_number": 5,
    "timestamp": str(datetime.utcnow())
}).encode('utf-8')

therefore, requiring (hoping) to be able to configure the connector in kafka connect with the schema id from the schema registry so that it's able to deserialize and format the payload for postgres inserts. The config to my jdbc connector is (establishing via api):

name = ''
config = {
    # ... jdbc connection details and config
    "connector.class": 'io.confluent.connect.jdbc.JdbcSinkConnector',   # use JDBC connector as sink
    # etc...

    # ... reporter config 

    # converters for kafka to the destination
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",   
    "value.converter": "io.confluent.connect.json.JsonSchemaConverter",
    "value.converter.schema.registry.url": "",
    "value.converter.ignore.default.for.nullables": "true",         # ignore null values
    "value.converter.schemas.enable": "false",                      # true | false -> include schema in the message
    "value.converter.subject.name.strategy": "io.confluent.kafka.serializers.subject.TopicNameStrategy", # subject name strategy
    "value.converter.use.schema.id": "2",

    # ...dead letter queue specifications
}

res = requests.put(f'http://localhost:8083/connectors/{name}/config', json=config)
print(f"returned status code: {res.status_code} (reason: {res.reason})")
res.raise_for_status()

which I can see reflects in the connector initialization config respectively (if interpreting correctly that this is what I need to follow):

However, sending payloads without schema ids, etc. still shows that the connector can't deserialize the payload, with the stack trace error of unknown magic byte being produced:

Can someone please identify if I'm setting the right connector config properties (specifying schema by id or subject name) so that I'm able to only produce json payloads without including schema specification... or if I'm not able to do what I want to do?

I can confirm that my schema is available on the schema registry for that id

Kafka Connect - Sink connector - set use schema id in connector config

Answers (1)

Related Questions