Google cloud pubsub python synchronous pull

Question

I have one topic and one subscription with multiple subscribers. My application scenario is I want to process messages on different subscribers with specific number of messages to be processed at a time. Means at first suppose 8 messages are processing then if one message processing done then after acknowledging processed message next message should take from the topic while taking care of no duplicate message to be found on any subscriber and every time 8 message should processed in the background.

For this I have use synchronous pull method with max_messages = 8 but next pulling is done after all messages process completed. So we have created own scheduler where at same time 8 process should be running at background and pulling 1 message at a time but still after all 8 message processing completed next message is delivered.

Here is my code:

    #!/usr/bin/env python3

    import logging
    import multiprocessing
    import time
    import sys
    import random
    from google.cloud import pubsub_v1

    project_id = 'xyz'
    subscription_name = 'abc'

    NUM_MESSAGES = 4
    ACK_DEADLINE = 50
    SLEEP_TIME = 20

    multiprocessing.log_to_stderr()
    logger = multiprocessing.get_logger()
    logger.setLevel(logging.INFO)

    def worker(msg):
        logger.info("Received message:{}".format(msg.message.data))
        random_sleep = random.randint(200,800)
        logger.info("Received message:{} for {} sec".format(msg.message.data, random_sleep))
        time.sleep(random_sleep)

    def message_puller():
        subscriber = pubsub_v1.SubscriberClient()
        subscription_path = subscriber.subscription_path(project_id, subscription_name)
        while(True):
            try:
                response = subscriber.pull(subscription_path, max_messages=1)
                message = response.received_messages[0]
                msg = message
                ack_id = message.ack_id
                process = multiprocessing.Process(target=worker, args=(message,))
                process.start()
                while process.is_alive():
                    # `ack_deadline_seconds` must be between 10 to 600.
                    subscriber.modify_ack_deadline(subscription_path,[ack_id],ack_deadline_seconds=ACK_DEADLINE)
                    time.sleep(SLEEP_TIME)
                # Final ack.
                subscriber.acknowledge(subscription_path, [ack_id])
                logger.info("Acknowledging message: {}".format(msg.message.data))
    except Exception as e:
        print (e)
        continue

    def synchronous_pull():
        p = []
        for i in range(0,NUM_MESSAGES):
            p.append(multiprocessing.Process(target=message_puller))

        for i in range(0,NUM_MESSAGES):
            p[i].start()

        for i in range(0,NUM_MESSAGES):
            p[i].join()

    if __name__ == '__main__':
        synchronous_pull()

Also for sometime subscriber.pull not pulling any messages even the while loop is always True. It gives me error as list index (0) out of range Concluding that subscriber.pull not pulling in message even messages are on the topic but after sometime it starts pulling. Why it is so?

I have tried with asynchronous pulling and flow control but duplicate message are found on multiple subscriber. If any other method will resolve my issue then let mi know. Thanks in advance.

Google cloud pubsub python synchronous pull

Answers (1)

Related Questions