Rpj
Rpj

Reputation: 6110

How to cache JSON data instead of accessing the REST endpoint

http://api.bitcoincharts.com/v1/markets.json (sample example)

I am planning to access several REST endpoints as mentioned below for data and at certain times access to some of the endpoints fail because of a connectivity error or the service being non available. I am interested only in the last snapshot of the data. In order to resolve this issue I would like to store the latest snapshot in a data store (preferably NoSQL) say Mongo or Redis and would want to modify the application logic to look at these data sources always instead of the API endpoint. This would always provide predictable data and I intend to run some CRON scripts to pull data regularly from these REST endpoints and store it in the above data sources.

http://api.foo.com/v1/foo.json
http://api.bar.com/v1/bar.json
http://api.baz.com/v1/baz.json
  1. Is there a better approach to resolve this issue?
  2. What storage would be appropriate for storing the JSON as it is and retrieve it for processing. Is it Mongo or Redis?

Upvotes: 6

Views: 4884

Answers (3)

GregG
GregG

Reputation: 791

If you happen to be using Java, ehcache (http://ehcache.org/) is perfect for this solution. We use it heavily for storing large json objects. The cache is configured via xml. It is basically a key/value store and the timeout can be set for each key (cache entry) independently. It is a single .jar file. The CacheManager class handles all the details. It takes just a couple of lines of code to implement.

Upvotes: 0

zenbeni
zenbeni

Reputation: 7213

You are using REST, so basically you can cache HTTP requests / responses using a simple HTTP reverse proxy with Apache HTTP, NGINX or Varnish for instance. Why bothering with NoSQL for a simple cache?

Of course MongoDB and Redis provide a lot more functionnalities but do you really need them? Look at this other question: Caching JSON objects on server side

Upvotes: 10

mohamedrias
mohamedrias

Reputation: 18576

  1. When you fetch data for the first time from REST endpoints, store the data in the caching layer and return to the service. When you get the subsequent requests, check if the data exists in cache if its not present then make request to REST and fetch the data.

    You need to mention expire time while storing the data in the caching layer. This will prevent the CRON job, because instead of fetching all the data at once, fetch it only when it is required, at that time check if it has expired in the cache.

  2. I would prefer redis as it's one of the best suitable for caching layer. It's a "NoSQL" key-value data store and it's not like MongoDB which is a disk based document store. Similar to memcache, it can evict old data as you add new one. Redis is a fantastic choice if you want a highly scalable data store shared by multiple processes, multiple applications, or multiple servers.Unlike Memcache, Redis provides powerful aggregate types like sorted sets and lists. It has a configurable persistence model, where it background saves at a specified interval, and can be run in a master-slave setup. All of our Redis deployments run in master-slave, with the slave set to save to disk about every minute.

    As just an inter-process communication mechanism it is tough to beat. Its speed also makes it great as a caching layer.

Upvotes: 2

Related Questions