Race Condition in Confluent Kafka Consumer with Asyncio and ThreadPoolExecutor in Python

Question

I'm working on a Python application where I need to consume messages from a Kafka topic, process them by making an async API request, and produce a response to an outbound Kafka topic. Since the Kafka client I'm using is synchronous (confluent-kafka), I decided to use ThreadPoolExecutor to run the consumer in a separate thread and launch async tasks in the main event loop for I/O-bound operations.

The code works, but I'm facing a race condition when two requests arrive simultaneously. The acknowledgment is sent for both requests, but the actual API request (inside fetch_response_from_rest_service) is sent for only one of the requests, twice. This issue is happening in the section of the code where I’ve marked a comment.

Here’s the relevant code:

import json
import confluent_kafka as kafka
import asyncio
from concurrent.futures import ThreadPoolExecutor
import logging as logger

async def run_prediction(inbound_topics, outbound_topics):
    consumer_args = {'bootstrap.servers': config.BOOTSTRAP_SERVERS,
                     'group.id': config.APPLICATION_ID,
                     'default.topic.config': {'auto.offset.reset': config.AUTO_OFFSET_RESET},
                     'enable.auto.commit': config.ENABLE_AUTO_COMMIT,
                     'max.poll.interval.ms': config.MAX_POLL_INTERVAL_MS}
    training_consumer = kafka.Consumer(consumer_args)
    training_consumer.subscribe(inbound_topics)
    outbound_producer = kafka.Producer({'bootstrap.servers': config.BOOTSTRAP_SERVERS})
    logger.info(f"Listening to inbound topic {inbound_topics}")
    
    while True:
        msg = training_consumer.poll(timeout=config.KAFKA_POLL_TIMEOUT)
        if not msg:
            continue
        if msg.error():
            logger.info(f"Consumer error: {str(msg.error())}")
            continue
        try:
            send_ack(msg.value(), "MESSAGE_RECEIVED")
            loop = asyncio.get_event_loop()
            loop.run_in_executor(executor, lambda: asyncio.run(
                fetch_response_from_rest_service(message=msg.value().decode(),
                                                 callback=kafka_status_callback(msg, outbound_producer,
                                                                        outbound_topics))))
        except Exception as ex:
            logger.exception(ex)
            

async def fetch_response_from_rest_service(message, callback):
    # race condition happens at this point message variable when two requests come at same time
    message = json.loads(message)
    url = "SOME_ENDPOINT"
    headers = {
        "Content-Type": "application/json"
    }
    response = None
    try:
        logger.info(f"sending request to {url} for payload {message}")
        response = await async_request("POST", url, headers, data=json.dumps(message),
                                       timeout=10)

        response = json.loads(response)
    except Exception as ex:
        logger.exception(f"All retries failed. Error: {ex}")
    finally:
        callback(response)

asyncio.run(run_prediction(["INBOUND_TOPIC"], ["OUTBOUND_TOPIC"]))

send_ack method just sends a message on a kafka topic regarding acknowledgment of processing.

I identified the race condition when multiple messages are processed at the same time. The fetch_response_from_rest_service function seems to be using the wrong message for one of the requests, causing it to reuse the some message twice and other one gets dropped.

I tried solving this by locking the section of the code that processes the message variable:

async def fetch_response_from_rest_service(message, callback):
    message_copy_lock = asyncio.Lock()
    async with message_copy_lock:
        logger.info(f"Got message : {json.loads(message)['conversationRequest']['requestId']}")
        message = json.loads(message)
    url = "SOME_ENDPOINT"
    headers = {
        "Content-Type": "application/json"
    }
    response = None
    try:
        logger.info(f"sending request to {url} for payload {message}")
        response = await async_request("POST", url, headers, data=json.dumps(message),
                                       timeout=10)
        response = json.loads(response)
    except Exception as ex:
        logger.exception(f"All retries failed. Error: {ex}")
    finally:
        callback(response)

However, this did not fix the issue.

My constraints:

I want to keep using the synchronous Kafka client (Confluent Kafka) due to project constraints. I am unable to switch to an async Kafka client like AIOKafka. I considered using asyncio.create_task() if I was using an async Kafka client, where I could await for response from API call and still go ahead with polling requests, but I want to avoid that path due to project limitations.

Questions:

Why is this happening?
How can I avoid the race condition in my current setup?
How do multiple event loops on a single thread work vs event loops running on different threads work?
Is my usage of asyncio.run_in_executor() correct in this context?
Should I be doing something differently to handle parallel requests safely? Would any other synchronization technique work better than the asyncio.Lock()?

Any help or suggestions would be appreciated!

Race Condition in Confluent Kafka Consumer with Asyncio and ThreadPoolExecutor in Python

Answers (1)

Related Questions