Jacobian
Jacobian

Reputation: 10802

Postgres Debezium does not publish the previous state of a record

I successfully installed Postgres Debezium CDC. Now, I'm able to catch all changes happening to the database. But the problem is "before" field always stays empty. So, if I insert a record (id = 1, name = Bill) I then get from Kafka this data:

'payload': {'before': None, 'after': {'id': 1, 'name': 'Bill'}, ...

But if I update the record like so:

UPDATE mytable set name = 'Bob' WHERE id = 1

I get this from Kafka:

'payload': {'before': None, 'after': {'id': 1, 'name': 'Bob'}, ...

This is how I configured my connector:

curl -X POST  localhost:8083/connectors/ \
  -H "Accept:application/json" -H "Content-Type:application/json" -d \
'{
    "name": "test-connector",
    "config": {
         "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
         "tasks.max": "1",
         "plugin.name": "pgoutput",
         "database.hostname": "postgres",
         "database.port": "5432",
         "database.user": "postgres",
         "database.password": "postgres",
         "database.dbname" : "test",
         "database.server.name": "postgres",
         "database.whitelist": "public.mytable",
         "database.history.kafka.bootstrap.servers": "kafka:9092",
         "database.history.kafka.topic": "public.topic"
    }
}'

What is wrong with that and how can I fix it?

Upvotes: 7

Views: 7101

Answers (2)

Tim108
Tim108

Reputation: 1

To add to Iskuskov's answer: Adding data to the REPLICA IDENTITY impacts database performance. If you don't need all columns, you can also set the REPLICA IDENTITY to a specific index to only use one or a few columns.

note: Seems like you always need the primary key in that index for Debezium to work

Relevant Postgres docs

Upvotes: 0

Iskuskov Alexander
Iskuskov Alexander

Reputation: 4365

before is an optional field that if present contains the state of the row before the event occurred. Whether or not this field is available is highly dependent on the REPLICA IDENTITY setting for each table.

REPLICA IDENTITY is a PostgreSQL specific table-level setting which determines the amount of information that is available to logical decoding in case of UPDATE and DELETE events.

To show the previous values of all the table’s columns, please set the REPLICA IDENTITY level to FULL:

ALTER TABLE public.mytable REPLICA IDENTITY FULL;

See more details in the Debezium docs.

Upvotes: 13

Related Questions