Microservices architecture tasks system problem

Question

At this moment i'm in the middle of writing my new application with a microservices architecture. A small brief explanation of what my application will do is as follows:

Microservice A will scrape multiple e-commerce product pages and send all the scraped products 1 by 1 to my next microservice, which I will call B from now on. For each product that has no Task with running: true, it scrapes the product and creates a new Task with running: true.
Microservice B will handle each product (update data in my database) it receives and sends all the changed data compared to the database to my next microservice, which I will call C from now on.
Microservice C receives a changed product and sends a message to my discord & slack channel. When done it will set the running task for this product to running: false

What I'm currently struggling with is that I want microservice A to start the scraping again for the products that have been processed by microservice C. For this I thought of some sort of task system, where each product getting scraped also has a task ID linked to them. The only problem with this I currently have is that:

A task might freeze/fail or whatever. To try to tackle this I have the tasks which are still running (variable in the database) and have started more than 5minutes ago, automatically stopped. This isn't ideal in my head tho, because this means a task could take 5minutes to complete.
Since every product getting scraped is assigned 1 task, I would have to quickly deploy a lot of microservices B to handle all the load correctly.

What I would like to ask, is that if somebody has a method or tip on how to improve/implement such a system in my microservices. Each product needs to be scraped right after the previous one has been finished. Currently microservice A just checks if it can find a running task for the product, with a setInterval.

All of this is developed in NodeJS & all of the information is saved in a MongoDB database. The communication between the microservices is done through a rabbitMQ.

Any help is very much appreciated.

bron10 · Accepted Answer

I would like to add two points to this architecture. It seems that every microservice changes the state of data with respect to time but the data source is same.

1. Why not change the data status at every microservice [state]?

For now you are using a boolean value for one job you started running:true. We can change it to something like ['scrapping', 'compare', 'notify']

{
    ...
    status : 'scrapping',
    jobId : 23,
    ...
}

Now when the data is at last microservice C, it can publish a new job with status of 'notify' for consumer microservice A, A can conditionally handle this scenario and rescrap if required. Other benefit is that every microservice can conditionally identify a job of basis of job status as well. Hence in any cases of failure or restart, every microservice will only perform a task if it fits to its criteria. For example microservice B won't start a job which doesn't have scrapping as a status. Basically, acknowledge your job only if completed using channel.ack(message).

2.Data synchronization

I will not recommend creating multiple B microservices as consumers, there might be an issue in data synchronization [while multiple consumer B, work on same page with different products] Either, you can measure your list of products per page basis adjust your queue configuration accordingly with some testing (but not too long queues as that will deteriorate speed and affect performance or bundle them as one job and send it for processing.

Explore more on :

Common rabbitMQ issues
Calculating queue size

Microservices architecture tasks system problem

Answers (1)

Related Questions