Kotharu
Kotharu

Reputation: 21

Kafka-Ksql Rekeying on stream results in data disappearing after few min

The rekeying works fine in the following scenarios 1. Base topic to stream then rekey - Good 2. Table to stream then rekey - Good

But when trying with Table - Table join results in new table, create stream on that resulting table, then rekey stream - It appears to work fine for few minutes(select query gets the expected result), and then the data disappears from the newly created stream.

-- Table on Table join - works as expected
CREATE TABLE JOINRESULT_T AS 
SELECT d.DEVICE_ID, d.LOCATION_ALIAS,d.UPDATED_BY, 
d.UPDATED_TIMESTAMP AS UPDATED_TIMESTAMP, d.__DELETED AS __DELETED
FROM TABLE1 d LEFT JOIN TABLE2 l ON d.LOCATION_ALIAS=l.ALIAS;

-- Stream from table - works as expected
CREATE STREAM FROMJOIN_S WITH(KAFKA_TOPIC='JOINRESULT_T', VALUE_FORMAT='AVRO');

-- Rekey the above stream - Data disappears after few minutes
CREATE STREAM REKEY_S AS SELECT * FROM FROMJOIN_S PARTITION BY DEVICE_ID;

Describe extended command on the stream displays the count of messages the stream is currently holding. But a select command does not give out any results. The print topic command on the associated topic also does not print anything.

enter image description here

Ksql version : 5.3.1 , also tried latest version Partitions = 1, replicas =1.

tried to investigate if the topic storage space on kafka server is holding any data and found the log file, snapshot files both to be empty.

what could cause the data to disappear once it is written to the topic?

Upvotes: 2

Views: 562

Answers (1)

Andrew Coates
Andrew Coates

Reputation: 1893

The most likely cause of this is that the retention on the REKEY_S topic in the Kafka cluster is set low and so is aggressively deleting 'old' messages.

Try investigating that the retention policy / settings is for REKEY_S topic. The policy should NOT be compact, and should have the default topic retention policy set by the Kafka cluster.

If strange retention settings is the cause, then it would be good to know why the retention is being set that way and by what. If the what is ksql, then its likely a bug and should be raised as a Github issue so it an be investigated.

Upvotes: 1

Related Questions