good_evening
good_evening

Reputation: 21759

Requesting a JSON 1000 times every second. Is it expensive?

Pseucode:

function getjson() {
   $.getJSON('link_in_my_server_unique_for_each_user_json', function(data) {
     if (data.variable) {
        do_something();
     }
   });​ 
}

setInterval(getjson(), every_second)

Is it expensive to have 1000 users retrieving json file in my server every second and check if that file has some variable?

Upvotes: 0

Views: 3314

Answers (9)

oxygen
oxygen

Reputation: 6049

As long as the JSON file is a "static" request (webserver directly serves the file as-is, without passing the request to some php/ruby/java/etc. process) you can determine wether your server can take it by simply benchmarking it.

This looks like a pre-cache to me (information which is to be requested is prepared by the server in advance and cached in the form of a structured response). Try using nginx for these types of requests. It also has optional modules for pre-gzipping ur files (it will automatically renew the gzip cache if you change the original file). This will give you additional CPU (and obviously more bandwidth).

Since you did not specify your file size, available bandwidth, CPU type, memory, etc., nobody can give you a yes/no answer for "Is it expensive?". It could be insignificant on a robust server with enough bandwidth (relative to your file size), or it could kill a sharedhosting or weak vps setup.

Update: If you set the expiration headers properly with a long Keep-alive (persistent HTTP TCP connection), you can benefit from HTTP response code 304 Not Modified (aka the server will only serve this status and some headers, and not the whole file all over again). Scripts will not be involved, serving the file will not be involved (unless it changes), TCP reconnection will not happen, disk reads will not happen (file stats are cached at least by the OS) - nginx might be the best bet for raw performance for static file checks/reads/serving.

Upvotes: 0

ChrisLively
ChrisLively

Reputation: 88072

This really looks like you're solving the wrong problem.

I'd question why you feel the need to have every single browser hit your site every single second they are viewing it. Maybe some context would help.

Now, if you truly need that capability then yes you can do it. However, it's all about cost. You'll need to talk to test it to determine exactly how many web servers you need in a load balanced configuration. Then go back to whoever came up with the requirements and let them know the costs associated with that "feature".

Upvotes: 0

ceejayoz
ceejayoz

Reputation: 180065

Chances are that yes, this'll kill your average shared hosting or small VPS.

It's entirely possible to offload most of this to a system like CloudFlare or AWS CloudFront with a short (~1 second) cache expiration. Most users would get it directly from the cache, saving your server most of the work.

Upvotes: 8

slashingweapon
slashingweapon

Reputation: 11317

First, you don't need to guess. You can measure.

  1. Use the developer tools that are available for your browser (Firebug for Firefox, Developer Tools for Chrome, etc.) and watch the requests and see how long each one takes.
  2. Observe the load on your server. If it is anything other than 0%, you're never going to see 1000 sessions running simultaneously.
  3. Repeat the steps above, but with a bunch of browsers open. It should be easy to get 20 browser windows open and running.
  4. Remember that, as the server load gets past 50% your performance will become increasingly non-linear. Once your system is blocking/thrashing a lot you can't expect to gainfully add any more clients.

Once you have a baseline measurement, you can think about optimization -- do you need it and how big a problem do you have on your hands? Some of the usual solutions:

  1. If at all possible, serve a static file or a piece of data from your APC cache.
  2. If your data is cacheable but you have multiple web servers, consider a solution like memcached, MongoDB, or some other centralized very-fast key-based retrieval system.
  3. If the data is dynamically retrieved from a database, consider using persistent database connections.
  4. If the CPU load is high per request, you probably have something expensive in your code path. Try to optimize the code path for that particular request, even if you have to hand-craft a special controller for it and bypass your usual framework.
  5. If the network latency is high per request, use HTTP headers to try and convince the client and server to keep the connection open.

Finally, if the performance you want seems very out of reach, you'll need to consider an architectural change. Using WebSockets would likely be a very different code path, but could conceivably result in far better performance. I'd have to know more about what you're doing with link_in_my_server_unique_for_each_user_json.

Upvotes: 20

Roy
Roy

Reputation: 428

Unless the variable is unique and relevant to the specific user, i would suggest one of these solutions:

  • Cache it using a Cloud Service
  • Push it to the user using Web Socket Connection
  • Off load the work to the client using javascript
  • Use long polling instead of interval polling

The age of polling for data is all but over, and there is often a better and more cost effective solution.

Upvotes: 1

Parris
Parris

Reputation: 18438

How about you don't check if the file has some variable and instead your tell you tell your front end that a variable has been created? Observer pattern at work!

There exists a few libraries that can do PHPWebSocket type stuff. They usually involve some long polling type strategy.

Check out: http://code.google.com/p/phpwebsocket/

Upvotes: 0

VVV
VVV

Reputation: 7593

Did you think about a web socket

It's kind of the same principal Stack Overflow uses. You can push data to the user when the variable actually changes. But your server needs to be setup correctly to do so.

Upvotes: 2

Mirko Adari
Mirko Adari

Reputation: 5103

So we are talking about at least 1k requests per second. That is already considered a fairly high load even for powerful machines. So what has to be done for each request?

  • Connection initialization
  • Processing request
  • Executing server side logic

With this scenario you are pretty much consuming all the resources available (including file i/o). Also you are consuming most of the web server resources for some addition value that is probably not your up most feature.

What would be a better approach?

You want to react to a change, instead of polling for it. So for each user we would have a channel that contains its events and when an event occurs we want the server to notify us. Unfortunately as mentioned in another answer, this is not PHPs strongest suit.

For client side you can look at SockJS and pair it with either Node.js or Vert.x. You get all the architecture needed for free and it is not very hard to set up. SockJS also comes with a nice set of protocol tests, so it´s quite easy to have your own server side implementation.

With these changes you will only have one request per user to the SockJS provider and you can scale it independently if needed. Also primary service is not interrupted by JSON calls. So we end up with

  • One request per page load to the SockJS provider
  • One request from PHP to the SockJS provider per change

It does make authentication a bit trickier, but you can have a private key known by both PHP application and SockJS provider and use it to sign some cookie. Then you can pass that cookie with your JSON request.

Upvotes: 1

Jeff Watkins
Jeff Watkins

Reputation: 6359

If you can't cache, perhaps consider the COMET pattern, so you'd have 1,000 long-held calls rather than 1,000 calls a second, overall servicing less traffic but delivering the desired result. See http://en.wikipedia.org/wiki/Comet_%28programming%29

Upvotes: 6

Related Questions