Unable to commit consuming offsets to Kafka on checkpoint in Flink new Kafka consumer-api (1.14)

Question

I am referring Flink 1.14 version for the Kafka source connector with the below code.

I am expecting the below requirements.

At the very new start of the application has to read from the latest offsets from the Kafka topic
On checkpoint, it has to commit the consumed offsets to the Kafka
After the restart(when the application killed manually/system error) it has to pick from the last committed offsets and should have to consume consumer lag and henceforth fresh event feeds.

With Flink new KafkaConsumer API (KafkaSource) I am facing the below problems

Able to do the above requirements but not able to commit the consumed offsets on a checkpoint(500ms). It rather commits after 2s or 3s.

When you kill the application manually within that 2s/3s and restart. Since the last consumed message is not committed it is read twice(duplicate).

To cross-check this feature I have tried with Flink Kafka's older consumer API (FlinkKafkaConsumer). There it is perfectly working. As and when a message is consumed immediately it is committed back to Kafka.

Steps followed

Set up the Kafka environment
Run the flink below code to consume. Code includes both old and new APIs. Both these APIs will consume from Kafka topic and print at the console
Push some messages to Kafka topic.
After pushing some messages and after they visible in console kill the Flink job.
Check kafka consumer groups for both APIs. New flink consumer api's group-id(test1) consumer lag is > 0 compared to older consumer api's group-id(older_test1).
When you restart Flink job, you can see those uncommitted messages are visible in the console from the new Flink kafka-consumer API leading to duplicate messages.

Please suggest if anything that I am missing or any property needs to be added.

 @Test
    public void test() throws Exception {

        System.out.println("FlinkKafkaStreamsTest started ..");

        StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(new Configuration());
        env.enableCheckpointing(500);
        env.setParallelism(4);

        Properties propertiesOld = new Properties();
        Properties properties = new Properties();
        String inputTopic = "input_topic";
        String bootStrapServers = "localhost:29092";
        String groupId_older = "older_test1";
        String groupId = "test1";

        propertiesOld.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootStrapServers);
        propertiesOld.put(ConsumerConfig.GROUP_ID_CONFIG, groupId_older);
        propertiesOld.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");

        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootStrapServers);
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);


        /******************** Old Kafka API **************/
        FlinkKafkaConsumer flinkKafkaConsumer = new FlinkKafkaConsumer<>(inputTopic,
                new KRecordDes(),
                propertiesOld);
        flinkKafkaConsumer.setStartFromGroupOffsets();
        env.addSource(flinkKafkaConsumer).print("old-api");


        /******************** New Kafka API **************/
        KafkaSourceBuilder sourceBuilder = KafkaSource.builder()
                .setBootstrapServers(bootStrapServers)
                .setTopics(inputTopic)
                .setGroupId(groupId)
                .setValueOnlyDeserializer(new SimpleStringSchema())
                .setProperty("enable.auto.commit", "false")
                .setProperty("commit.offsets.on.checkpoint", "true")
                .setProperties(properties)
                .setStartingOffsets(OffsetsInitializer.committedOffsets(OffsetResetStrategy.LATEST));

        KafkaSource kafkaSource = sourceBuilder.build();

        SingleOutputStreamOperator source = env
                .fromSource(kafkaSource, WatermarkStrategy.forMonotonousTimestamps(), "Kafka Source");

        source.print("new-api");

        env.execute();
    }
    static class KRecordDes implements  KafkaDeserializationSchema{
        @Override
        public TypeInformation getProducedType() {
            return TypeInformation.of(String.class);
        }
        @Override
        public boolean isEndOfStream(String nextElement) {
            return false;
        }
        @Override
        public String deserialize(ConsumerRecord consumerRecord) throws Exception {
            return new String(consumerRecord.value());
        }
    }

Note: I have other requirements where I want the Flink Kafka bounded source reader in the same code, which is available in new APIs(KafkaSource).

Unable to commit consuming offsets to Kafka on checkpoint in Flink new Kafka consumer-api (1.14)

Answers (1)

Related Questions