How to fail gracefully when processing tasks with multithreaded pipeline

Question

I'm designing a multi-threaded data pipeline which rests on Service A, I'm following a producer-consumer design to process a list of objects. where my producer calls the api of Service B to retrieve tasks for each object (could be millions of tasks per object), and consumer processes the tasks and posts it onto Service C.

My question is, how do I design this so that when either Service B goes down (producer can't get tasks) or Service C goes down (consumer can't process tasks), I can gracefully stop and save my progress.

I can't store all of my tasks inside a BlockingQueue first because it would be too many tasks to handle in-memory.

I am thinking of storing my tasks inside a DB but then it would dramatically slow down my pipeline as it now has to perform write/read operations. Furthermore, wouldn't it be a waste of DB space if I'm storing millions of tasks that I only ever need to process once? I would have to delete them all from the DB after the processing finishes, so it seems like a waste of effort almost?

How to fail gracefully when processing tasks with multithreaded pipeline

Answers (1)

Related Questions