Reputation: 9407

Dynamically blocking IPs in high-traffic site: best strategy?

I've got some bad bots targeting my website and I need to dynamically handle the IP addresses from which those bots come. It's a pretty high-traffic site, we get a couple of millions of pageviews per day and that's why we're using 4 servers (loadbalanced). We don't use any caching (besides assets) because most of our responses are unique.

Code-technically it's a pretty small PHP website, which does no database queries and one XML request per pageview. The XML request get's a pretty fast response.

I've developed a script to (very frequently) analyse which IP addresses are doing abusive requests and I want to handle requests from those IPs differently for a certain amount of time. The IPs that are abusive change a lot so I need to block different IPs every couple of minutes

So: I see IP xx.xx.xx.xx being abusive, I record this somewhere and then I want to give that IP a special treatment for the next x minutes it does requests. I need to do this in a fast way, because I don't want to slow down the server and have the legitimate users suffer for this.

Solution 1: file

Writing the abusive IPs down in a file and then reading that file for every request seems too slow. Would you agree?

Solution 2:PHP include

I could let my analysis script write a PHP include file which the PHP engine then would include for every request. But: I can imagine that, while writing the PHP file, a lot of users that do a request right then get an error because the file is being used.

I could solve that potential problem by writing the file and then doing a symlink change (which might be faster).

Solution 3: htaccess

Another way to separate the abusers out would be to write an htacces that blocks or redirects them. This might be the most efficient way but I need to write an htaccess file every x minutes then.

I'd love to hear some thoughts/reactions on my proposed solutions, especially concerning speed.

Upvotes: 1

Answers (3)

Omri Spector

Reputation: 2561

I would seriously consider putting up another server that holds the (constantly changing) block list in-memory and serves the front-end servers. I implemented such a solution using Node.JS and found the implementation easy and performance very good. memcached could also be used, but I never tried it.

Upvotes: 0

Niels Bom

Reputation: 9407

For the record I've finally decided to go for (my own proposed) solution number 2, generating a PHP file that is included on every page request.

The complete solution is as follows: A Python script analyses the accesslog file every x minutes and doles out "punishments" to certain IP addresses. All currently running punishments are written into a fairly small (<1Kb) PHP file. This PHP file is included for every page request. Directly after generation of the PHP file an rsync job is started to push the new PHP file out to the other 3 servers behind the loadbalancer.

In the Python script that generates the PHP file I first concatenate the complete contents of the file. I then open, write and close the file sequentially to lock the file for the shortest possible period.

Upvotes: 1

e.dan

Reputation: 7507

What about dynamically configuring iptables to block the bad IPs? I don't see any reason to do the "firewalling" in PHP...

Upvotes: 1

Dynamically blocking IPs in high-traffic site: best strategy?

Answers (3)

Related Questions