Best architecture for a system calling external slow response API

Question

I would like to request some guidelines and advice about a problem being faced. This is the structure of the system:

Client Desktop Application
Web Api 1
Web Api 2

Desktop client calls(HttpPost) => Web API 1 (HttpPost) => Web API 2

Web API 1 is internal and we can control/update it. Web API 2 is external and we cannot update/improve it.

Client Desktop app is sending a very big amount of data in JSON format and hundreds of requests every day. The data is sent and received to populate local Database in Client Desktop scope.

The bottleneck we have is at Web API 2 , it is very slow and each request sent takes a very long time to be completed. So usually, Web API 1 gets timeouts, and the desktop client also gets timeouts and keeps loading forever. Sometimes the process can take days.

The client desktop is calling via HttpClient to Web API 1.

The approach I see there, is to create an intermediate DB/table at Web Api 1 level, that stores the data that should be sent to Web API 2. We will receive this data from desktop client and then store it in a DB inmediately. Then a process/scheduled job could run to read new records in this intermediate DB/Table and send the data to Web API 2. This process could execute retry logic in case of timeouts or store the data for each request to have a report available to see the progress and the number of errors.

In this case the tables structure could be like Job(JobId/Progress/Status) = > Data Items (Data/AddedDate/Status/ErrorMessage)

To add this intermediate database in the process flow makes sense to you? Would you recommend other performance-oriented options to solve this scenario? Such as Queues, Cache, No-SQL storage?

Any suggestion or architecture guideline will be appreciated, I am not an architect or expert but I would like to identify what road and approaches can be taken. I think this is a common problem in Data Systems that process high data volumen.

Peter Csala · Accepted Answer

Your thoughts regarding architecture are a good starting point.

Client <-> Internal WebApi

So, when the Client sends requests to internal WebAPI then that should be considered as an "ingestion service". It accepts the requests, stores it in intermediate persistent storage and acknowledges back that fact to the client.

HTTP Status Code 202 (Accepted) is usually used for these situations when the request received, accepted but not processed.

The response can (not mandatory) contain an identifier for the request. This can be used to retrieve the processing status:

Internal WebApi might expose a GET endpoint (like /{id}/status)
- Client can issue requests based on user demand or can long-poll them in the background
Or you can use WebSocket between the Client and Internal WebApi, so the service can inform the client about request status changes

Internal WebApi <-> External WebApi

As it was said the internal WebApi acts as an ingestion service. It has a local (in case of High Availability a replicated) storage. If you want to have robust solution (so pending requests can survive service crash) then the storage should be persistent not ephemeral.

The chose of storage depends on a lots of thing like

the payload size,
retention policy,
ordering of the requests
number of consumers
etc.

So, without knowing the exact requirements and circumstances it is impossible to propose a good solution.

Here I want to emphasize one thing regarding retry. Do not blindly apply retry logic for any request. In order to perform retry logic the following preconditions should be met:

The potentially introduced observable impact is acceptable
The operation can be redone without any irreversible side effect
The introduced complexity is negligible compared to the promised reliability

For further details please check out my article regarding retry policy in general.

Best architecture for a system calling external slow response API

Answers (1)

Client <-> Internal WebApi

Internal WebApi <-> External WebApi

Related Questions