Reputation: 66
Most 3rd party API's are rate limited.
Let's say I'm fetching data from a REST service such as the Twitter API. I wish to display a list of tweets for a specific hashtag. For the purposes of this example, a large JSON response of 1,000 tweets is returned.
{ "tweets": { "1": { "tweet": "hello", "name": "ben", "date": "2018-01-01" }, "2": { "tweet": "hi", "name": "dave", "date": "2018-01-02" }, "3": { "tweet": "hey", "name": "holly", "date": "2018-01-03" }... } }
Each of the 1,000 tweets has their own properties; tweet, name, date.
If I called the Twitter Api on every page refresh we would soon hit the rate limit. I only need to grab a new list of tweets few minutes.
A cache mechanism is clearly required.
Which plan is better;
A) Store the entire JSON result on the local filesystem and parse when needed.
B) Store the entire JSON result in a database in one field ie "json_response" and parse when needed.
C) Loop over each tweet, inserting each tweet as a separate row with a table field for each parameter (tweet, creator name, date, time, url, image). Then using a SELECT query to return and rebuild the response.
If the entire result is stored as one, the database will become huge as the JSON response contains the properties and JSON structure "tweet, name, date" etc for each tweet.
If the data is split into corresponding fields, there is the additional INSERT/SELECT/DB connections and parsing time required. If new data is passed to the API result it will not be stored unless a new DB field is mapped.
Upvotes: 2
Views: 1361
Reputation: 4028
Which plan is better
As so often, it depends.
If you have a limited set of response types, and you want to consume the results as is (without any additional filtering), use a standard cache solution (variant A). You could write one yourself, but there many ready-to-use solutions in the wild. Refreshing the cache means to retrieve all data from the API again.
If you store the results en bloc (variant B), you gain nothing but the database overhead over variant A. So it is not recommendable to do this.
If you split the response into separate records and store them individually (variant C), you get the best flexibility. Although database transactions need performance, that might be outweighed by not needing to parse the results again and again. You can filter the records by any criterion. For a refresh, you just need to fetch the data since the last update, saving bandwidth.
The best option in your case might be to store the individual records in a NoSQL database. It saves you from parsing the results individually and caters for changes in the result structure automatically.
Upvotes: 2