Ben
Ben

Reputation: 66

Cache API Response In Database Or Filesystem

Most 3rd party API's are rate limited.

Let's say I'm fetching data from a REST service such as the Twitter API. I wish to display a list of tweets for a specific hashtag. For the purposes of this example, a large JSON response of 1,000 tweets is returned.


    {
      "tweets": {
        "1": {
          "tweet": "hello",
          "name": "ben",
          "date": "2018-01-01"
        },
        "2": {
          "tweet": "hi",
          "name": "dave",
          "date": "2018-01-02"
        },
        "3": {
          "tweet": "hey",
          "name": "holly",
          "date": "2018-01-03"
        }...
      }
    }

Each of the 1,000 tweets has their own properties; tweet, name, date.

If I called the Twitter Api on every page refresh we would soon hit the rate limit. I only need to grab a new list of tweets few minutes.

A cache mechanism is clearly required.

Which plan is better;

A) Store the entire JSON result on the local filesystem and parse when needed.

B) Store the entire JSON result in a database in one field ie "json_response" and parse when needed.

C) Loop over each tweet, inserting each tweet as a separate row with a table field for each parameter (tweet, creator name, date, time, url, image). Then using a SELECT query to return and rebuild the response.

If the entire result is stored as one, the database will become huge as the JSON response contains the properties and JSON structure "tweet, name, date" etc for each tweet.

If the data is split into corresponding fields, there is the additional INSERT/SELECT/DB connections and parsing time required. If new data is passed to the API result it will not be stored unless a new DB field is mapped.

Upvotes: 2

Views: 1361

Answers (1)

nibra
nibra

Reputation: 4028

Which plan is better

As so often, it depends.

Uniform Responses

If you have a limited set of response types, and you want to consume the results as is (without any additional filtering), use a standard cache solution (variant A). You could write one yourself, but there many ready-to-use solutions in the wild. Refreshing the cache means to retrieve all data from the API again.

Simple Database

If you store the results en bloc (variant B), you gain nothing but the database overhead over variant A. So it is not recommendable to do this.

Structured Database

If you split the response into separate records and store them individually (variant C), you get the best flexibility. Although database transactions need performance, that might be outweighed by not needing to parse the results again and again. You can filter the records by any criterion. For a refresh, you just need to fetch the data since the last update, saving bandwidth.

NoSQL Database

The best option in your case might be to store the individual records in a NoSQL database. It saves you from parsing the results individually and caters for changes in the result structure automatically.

Upvotes: 2

Related Questions