Reputation: 1287
I want to use data.table:: fwrite to do rapid-fire storage & retrieval of states in the form of text logs. These are updated through a mobile app that uses plumber API calls into R endpoints. The mobile apps may fire many APIs per second and there is a chance of same row being modified by two APIs within a gap of ~0.5 seconds. I am avoiding DB read and write due to delays of 1~2 seconds per API call (fwrite of R can do the same job in 0.5 seconds the first time and then it finishes the API in less than 20 msec in subsequent calls)
My question is:
will fwrite/ fread combination work for higher traffic scenario or do I have to look for ways of locking the file to avoid a corruption? Are there any ways of locking a file for read or write?
Upvotes: 4
Views: 294
Reputation: 17517
will fwrite/ fread combination work for higher traffic scenario or do I have to look for ways of locking the file to avoid a corruption?
The answer is "it depends."
If you're hosting the app using a simple hosting model in which all traffic hits the same, singleton R process, then you're likely going to be OK even in high traffic scenarios. The caveats here would be that this would not hold up if you were doing any kind of internal process forking in your API (or if data.table does that; I'm not sure, I've never used it).
However, if you're hosting the application with multiple R processes and a load balancer in front of them, then you're going to run into trouble with multiple processes trying to write into the same file.
The typical advice for scaling a Plumber API is to scale horizontally by adding more R processes. So I'd encourage you to try to find a design that would continue to work if/when you do end up wanting multiple Plumber processes running in parallel. You could look at centralizing in a remote database or even doing so locally using SQLite, just be sure that it's configured to support multiple concurrent writers (I'm not 100% sure if SQLite supports this).
I would certainly not expect a delay of 1-2 seconds to hit a DB. It might be worth investigating your DB hardware/software or checking if there are any latencies in the network. You could also look at the pool package as a way of keeping the database connections open and available for your API. I'd guess that would dramatically reduce the time that your DB writes require.
Upvotes: 2