Reputation: 6246
I would like to request some guidelines and advice about a problem being faced. This is the structure of the system:
Desktop client calls(HttpPost) => Web API 1 (HttpPost) => Web API 2
Web API 1 is internal and we can control/update it. Web API 2 is external and we cannot update/improve it.
Client Desktop app is sending a very big amount of data in JSON format and hundreds of requests every day. The data is sent and received to populate local Database in Client Desktop scope.
The bottleneck we have is at Web API 2 , it is very slow and each request sent takes a very long time to be completed. So usually, Web API 1 gets timeouts, and the desktop client also gets timeouts and keeps loading forever. Sometimes the process can take days.
The client desktop is calling via HttpClient to Web API 1.
The approach I see there, is to create an intermediate DB/table at Web Api 1 level, that stores the data that should be sent to Web API 2. We will receive this data from desktop client and then store it in a DB inmediately. Then a process/scheduled job could run to read new records in this intermediate DB/Table and send the data to Web API 2. This process could execute retry logic in case of timeouts or store the data for each request to have a report available to see the progress and the number of errors.
In this case the tables structure could be like Job(JobId/Progress/Status) = > Data Items (Data/AddedDate/Status/ErrorMessage)
To add this intermediate database in the process flow makes sense to you? Would you recommend other performance-oriented options to solve this scenario? Such as Queues, Cache, No-SQL storage?
Any suggestion or architecture guideline will be appreciated, I am not an architect or expert but I would like to identify what road and approaches can be taken. I think this is a common problem in Data Systems that process high data volumen.
Upvotes: 1
Views: 1285
Reputation: 22714
Your thoughts regarding architecture are a good starting point.
So, when the Client sends requests to internal WebAPI then that should be considered as an "ingestion service". It accepts the requests, stores it in intermediate persistent storage and acknowledges back that fact to the client.
HTTP Status Code 202 (Accepted) is usually used for these situations when the request received, accepted but not processed.
The response can (not mandatory) contain an identifier for the request. This can be used to retrieve the processing status:
/{id}/status
)
As it was said the internal WebApi acts as an ingestion service. It has a local (in case of High Availability a replicated) storage. If you want to have robust solution (so pending requests can survive service crash) then the storage should be persistent not ephemeral.
The chose of storage depends on a lots of thing like
So, without knowing the exact requirements and circumstances it is impossible to propose a good solution.
Here I want to emphasize one thing regarding retry. Do not blindly apply retry logic for any request. In order to perform retry logic the following preconditions should be met:
For further details please check out my article regarding retry policy in general.
Upvotes: 2